llama-nuts-and-bolts  by adalkiran

Llama 3.1 inference implementation, educational resource

created 1 year ago
311 stars

Top 87.6% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a deep dive into the practical implementation of the Llama 3.1 8B-Instruct model, targeting engineers and researchers seeking to understand LLM internals beyond theoretical concepts. It offers a complete, dependency-free reimplementation of the model's inference pipeline in Go, enabling a granular understanding of each component.

How It Works

The project meticulously reconstructs Llama 3.1's architecture and inference process from the ground up, avoiding external libraries. It implements core functionalities like BFloat16 data types, memory mapping, tokenization, tensor operations, and rotary positional embeddings entirely in Go. Parallelization via goroutines is used to leverage CPU cores for computations, eschewing GPGPU or SIMD acceleration for educational clarity.

Quick Start & Requirements

  • Installation: Clone the repository and build the Go executable (go build -o llama-nb cmd/main.go) or use Docker (docker-compose up -d).
  • Prerequisites: Go toolchain, wget, md5sum.
  • Model Download: Requires downloading official Llama 3.1 8B-Instruct model files (~16GB) from Meta, following instructions provided in the README.
  • Documentation: GitHub Pages

Highlighted Details

  • Implements Llama 3.1 8B-Instruct inference entirely in Go, without Python or external ML libraries.
  • Covers BFloat16 implementation, memory mapping, RoPE, and custom tensor operations.
  • Supports CPU-based inference with parallelization via goroutines.
  • Provides a CLI for predefined or custom prompts with streaming output.

Maintenance & Community

The project is maintained by adalkiran. Further community engagement details are not specified in the README.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

The project is explicitly for educational purposes and has not been tested for production or commercial use. It lacks GPGPU/SIMD support and does not implement advanced sampling techniques like top-k or temperature, only outputting the highest probability tokens. Functionality is tailored specifically to the Llama 3.1 8B-Instruct model.

Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 19 hours ago
Feedback? Help us improve.