nano-llama31  by karpathy

Minimal Llama 3.1 implementation for training, finetuning, and inference

created 1 year ago
1,410 stars

Top 29.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a minimal, dependency-free implementation of the Llama 3.1 architecture, inspired by the nanoGPT project. It aims to simplify training, fine-tuning, and inference for the Llama 3.1 8B base model, offering a cleaner alternative to the official Meta and Hugging Face releases. The project is actively developed and targets users who need a more streamlined and understandable codebase for working with Llama 3.1.

How It Works

The project replicates the Llama 3.1 architecture in a single PyTorch file (llama31.py), focusing on clarity and minimal dependencies. It achieves this by adapting and simplifying code from Meta's official release, ensuring functional parity through rigorous testing against the reference implementation. This approach allows for easier understanding and modification of the model's components.

Quick Start & Requirements

  • Install: Create a conda environment (conda create -n llama31 python=3.10, conda activate llama31), clone the official llama-models repo, download the Llama 3.1 8B model, install llama-models (pip install -r requirements.txt, pip install -e .), and then run inference with torchrun --nnodes 1 --nproc_per_node 1 reference.py --ckpt_dir <path_to_model> --tokenizer_path <path_to_model>.
  • Prerequisites: Python 3.10 (avoiding newer versions due to potential PyTorch compatibility issues), PyTorch, and access to download Llama 3.1 models from Meta (requires requesting access).
  • Resources: Downloading the 8B model requires ~16GB of disk space. Fine-tuning requires significant VRAM (e.g., 80GB GPU for RMSNorm training).
  • Links: Official Llama 3.1 Model Access

Highlighted Details

  • Minimal, dependency-free PyTorch implementation of Llama 3.1.
  • Verified functional parity with official Meta inference code.
  • Includes a fix for the trailing whitespace bug present in Meta's example_text_completion.py.
  • Early-stage fine-tuning capabilities demonstrated on the Tiny Stories dataset.

Maintenance & Community

Actively developed, but marked as "WIP" and "not ready for prime time." The README indicates ongoing work to add features, improve fine-tuning, and support chat models.

Licensing & Compatibility

The README does not explicitly state the license for this repository. It relies on Meta's official Llama 3.1 models, which have their own usage terms. Compatibility with commercial or closed-source projects would depend on the underlying Llama 3.1 license.

Limitations & Caveats

The project is explicitly marked as "WIP" and "not ready for prime time." Fine-tuning is still considered broken, with specific issues noted regarding attention masking for BOS tokens and KV cache usage during training. Support for models larger than 8B and chat models is pending. A warning about deprecated set_default_tensor_type is present.

Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
53 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
3 more.

LLaMA-Adapter by OpenGVLab

0.0%
6k
Efficient fine-tuning for instruction-following LLaMA models
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
10 more.

open_llama by openlm-research

0.0%
8k
Open-source reproduction of LLaMA models
created 2 years ago
updated 2 years ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
2 more.

lit-llama by Lightning-AI

0.1%
6k
LLaMA implementation for pretraining, finetuning, and inference
created 2 years ago
updated 1 month ago
Feedback? Help us improve.