Minimal Llama 3.1 implementation for training, finetuning, and inference
Top 29.4% on sourcepulse
This repository provides a minimal, dependency-free implementation of the Llama 3.1 architecture, inspired by the nanoGPT project. It aims to simplify training, fine-tuning, and inference for the Llama 3.1 8B base model, offering a cleaner alternative to the official Meta and Hugging Face releases. The project is actively developed and targets users who need a more streamlined and understandable codebase for working with Llama 3.1.
How It Works
The project replicates the Llama 3.1 architecture in a single PyTorch file (llama31.py
), focusing on clarity and minimal dependencies. It achieves this by adapting and simplifying code from Meta's official release, ensuring functional parity through rigorous testing against the reference implementation. This approach allows for easier understanding and modification of the model's components.
Quick Start & Requirements
conda create -n llama31 python=3.10
, conda activate llama31
), clone the official llama-models
repo, download the Llama 3.1 8B model, install llama-models
(pip install -r requirements.txt
, pip install -e .
), and then run inference with torchrun --nnodes 1 --nproc_per_node 1 reference.py --ckpt_dir <path_to_model> --tokenizer_path <path_to_model>
.Highlighted Details
example_text_completion.py
.Maintenance & Community
Actively developed, but marked as "WIP" and "not ready for prime time." The README indicates ongoing work to add features, improve fine-tuning, and support chat models.
Licensing & Compatibility
The README does not explicitly state the license for this repository. It relies on Meta's official Llama 3.1 models, which have their own usage terms. Compatibility with commercial or closed-source projects would depend on the underlying Llama 3.1 license.
Limitations & Caveats
The project is explicitly marked as "WIP" and "not ready for prime time." Fine-tuning is still considered broken, with specific issues noted regarding attention masking for BOS tokens and KV cache usage during training. Support for models larger than 8B and chat models is pending. A warning about deprecated set_default_tensor_type
is present.
11 months ago
Inactive