Go library for local LLM inference, like llama.cpp
Top 30.0% on sourcepulse
This project provides a pure Golang implementation of the LLaMA inference engine, aiming to make large language models more accessible to developers without requiring deep C++ expertise or specialized hardware. It targets ML enthusiasts and developers looking to integrate LLM capabilities into Golang applications.
How It Works
llama.go reimplements the core logic of ggml.cpp
in Go, focusing on performance and ease of use. It handles tensor math and the LLaMA neural network architecture directly in Go, leveraging multi-threading and platform-specific optimizations like AVX2 and ARM NEON for improved inference speed on CPUs.
Quick Start & Requirements
go build -o llama-go main.go
.llama-7b-fp32.bin
). Requires significant RAM (32GB+ for 7B models)../llama-go-v1.4.0-macos --model ~/models/llama-7b-fp32.bin --prompt "Your prompt here"
Highlighted Details
ggml.cpp
.Maintenance & Community
The project appears to be actively developed, with a roadmap outlining support for LLaMA V2, GGUF format, INT8 quantization, and GPU acceleration (Nvidia/AMD) in future versions.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Currently supports only FP32 weights for LLaMA models and lacks GPU acceleration. The project requires substantial system RAM, not VRAM, for model loading. Obtaining and converting original LLaMA models is a prerequisite.
10 months ago
Inactive