llama.go  by gotzmann

Go library for local LLM inference, like llama.cpp

Created 2 years ago
1,379 stars

Top 29.2% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a pure Golang implementation of the LLaMA inference engine, aiming to make large language models more accessible to developers without requiring deep C++ expertise or specialized hardware. It targets ML enthusiasts and developers looking to integrate LLM capabilities into Golang applications.

How It Works

llama.go reimplements the core logic of ggml.cpp in Go, focusing on performance and ease of use. It handles tensor math and the LLaMA neural network architecture directly in Go, leveraging multi-threading and platform-specific optimizations like AVX2 and ARM NEON for improved inference speed on CPUs.

Quick Start & Requirements

  • Install: Download pre-built binaries or build from source using go build -o llama-go main.go.
  • Prerequisites: Original LLaMA model files (e.g., llama-7b-fp32.bin). Requires significant RAM (32GB+ for 7B models).
  • Run: ./llama-go-v1.4.0-macos --model ~/models/llama-7b-fp32.bin --prompt "Your prompt here"
  • Docs: https://github.com/gotzmann/llama.go

Highlighted Details

  • Pure Golang reimplementation of ggml.cpp.
  • Supports LLaMA 7B and 13B models (FP32 weights).
  • Includes an embedded REST API server mode for integration.
  • Optimizations for x64 AVX2 and ARM NEON.

Maintenance & Community

The project appears to be actively developed, with a roadmap outlining support for LLaMA V2, GGUF format, INT8 quantization, and GPU acceleration (Nvidia/AMD) in future versions.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently supports only FP32 weights for LLaMA models and lacks GPU acceleration. The project requires substantial system RAM, not VRAM, for model loading. Obtaining and converting original LLaMA models is a prerequisite.

Health Check
Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

MiniCPM by OpenBMB

0.4%
8k
Ultra-efficient LLMs for end devices, achieving 5x+ speedup
Created 1 year ago
Updated 1 week ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
25 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.