nano-llama31 by karpathy

Minimal Llama 3.1 implementation for training, finetuning, and inference

Created 1 year ago

1,437 stars

Top 28.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Simon Willison

Coauthor of Django

Project Summary

This repository provides a minimal, dependency-free implementation of the Llama 3.1 architecture, inspired by the nanoGPT project. It aims to simplify training, fine-tuning, and inference for the Llama 3.1 8B base model, offering a cleaner alternative to the official Meta and Hugging Face releases. The project is actively developed and targets users who need a more streamlined and understandable codebase for working with Llama 3.1.

How It Works

The project replicates the Llama 3.1 architecture in a single PyTorch file (llama31.py), focusing on clarity and minimal dependencies. It achieves this by adapting and simplifying code from Meta's official release, ensuring functional parity through rigorous testing against the reference implementation. This approach allows for easier understanding and modification of the model's components.

Quick Start & Requirements

Install: Create a conda environment (conda create -n llama31 python=3.10, conda activate llama31), clone the official llama-models repo, download the Llama 3.1 8B model, install llama-models (pip install -r requirements.txt, pip install -e .), and then run inference with torchrun --nnodes 1 --nproc_per_node 1 reference.py --ckpt_dir <path_to_model> --tokenizer_path <path_to_model>.
Prerequisites: Python 3.10 (avoiding newer versions due to potential PyTorch compatibility issues), PyTorch, and access to download Llama 3.1 models from Meta (requires requesting access).
Resources: Downloading the 8B model requires ~16GB of disk space. Fine-tuning requires significant VRAM (e.g., 80GB GPU for RMSNorm training).
Links: Official Llama 3.1 Model Access

Highlighted Details

Minimal, dependency-free PyTorch implementation of Llama 3.1.
Verified functional parity with official Meta inference code.
Includes a fix for the trailing whitespace bug present in Meta's example_text_completion.py.
Early-stage fine-tuning capabilities demonstrated on the Tiny Stories dataset.

Maintenance & Community

Actively developed, but marked as "WIP" and "not ready for prime time." The README indicates ongoing work to add features, improve fine-tuning, and support chat models.

Licensing & Compatibility

The README does not explicitly state the license for this repository. It relies on Meta's official Llama 3.1 models, which have their own usage terms. Compatibility with commercial or closed-source projects would depend on the underlying Llama 3.1 license.

Limitations & Caveats

The project is explicitly marked as "WIP" and "not ready for prime time." Fine-tuning is still considered broken, with specific issues noted regarding attention masking for BOS tokens and KV cache usage during training. Support for models larger than 8B and chat models is pending. A warning about deprecated set_default_tensor_type is present.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days