llama.go by gotzmann

Go library for local LLM inference, like llama.cpp

Created 2 years ago

1,396 stars

Top 28.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Project Summary

This project provides a pure Golang implementation of the LLaMA inference engine, aiming to make large language models more accessible to developers without requiring deep C++ expertise or specialized hardware. It targets ML enthusiasts and developers looking to integrate LLM capabilities into Golang applications.

How It Works

llama.go reimplements the core logic of ggml.cpp in Go, focusing on performance and ease of use. It handles tensor math and the LLaMA neural network architecture directly in Go, leveraging multi-threading and platform-specific optimizations like AVX2 and ARM NEON for improved inference speed on CPUs.

Quick Start & Requirements

Install: Download pre-built binaries or build from source using go build -o llama-go main.go.
Prerequisites: Original LLaMA model files (e.g., llama-7b-fp32.bin). Requires significant RAM (32GB+ for 7B models).
Run: ./llama-go-v1.4.0-macos --model ~/models/llama-7b-fp32.bin --prompt "Your prompt here"
Docs: https://github.com/gotzmann/llama.go

Highlighted Details

Pure Golang reimplementation of ggml.cpp.
Supports LLaMA 7B and 13B models (FP32 weights).
Includes an embedded REST API server mode for integration.
Optimizations for x64 AVX2 and ARM NEON.

Maintenance & Community

The project appears to be actively developed, with a roadmap outlining support for LLaMA V2, GGUF format, INT8 quantization, and GPU acceleration (Nvidia/AMD) in future versions.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently supports only FP32 weights for LLaMA models and lacks GPU acceleration. The project requires substantial system RAM, not VRAM, for model loading. Obtaining and converting original LLaMA models is a prerequisite.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days