llama.go  by gotzmann

Go library for local LLM inference, like llama.cpp

created 2 years ago
1,372 stars

Top 30.0% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a pure Golang implementation of the LLaMA inference engine, aiming to make large language models more accessible to developers without requiring deep C++ expertise or specialized hardware. It targets ML enthusiasts and developers looking to integrate LLM capabilities into Golang applications.

How It Works

llama.go reimplements the core logic of ggml.cpp in Go, focusing on performance and ease of use. It handles tensor math and the LLaMA neural network architecture directly in Go, leveraging multi-threading and platform-specific optimizations like AVX2 and ARM NEON for improved inference speed on CPUs.

Quick Start & Requirements

  • Install: Download pre-built binaries or build from source using go build -o llama-go main.go.
  • Prerequisites: Original LLaMA model files (e.g., llama-7b-fp32.bin). Requires significant RAM (32GB+ for 7B models).
  • Run: ./llama-go-v1.4.0-macos --model ~/models/llama-7b-fp32.bin --prompt "Your prompt here"
  • Docs: https://github.com/gotzmann/llama.go

Highlighted Details

  • Pure Golang reimplementation of ggml.cpp.
  • Supports LLaMA 7B and 13B models (FP32 weights).
  • Includes an embedded REST API server mode for integration.
  • Optimizations for x64 AVX2 and ARM NEON.

Maintenance & Community

The project appears to be actively developed, with a roadmap outlining support for LLaMA V2, GGUF format, INT8 quantization, and GPU acceleration (Nvidia/AMD) in future versions.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently supports only FP32 weights for LLaMA models and lacks GPU acceleration. The project requires substantial system RAM, not VRAM, for model loading. Obtaining and converting original LLaMA models is a prerequisite.

Health Check
Last commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
3 more.

LLaMA-Adapter by OpenGVLab

0.0%
6k
Efficient fine-tuning for instruction-following LLaMA models
created 2 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.