Deepdive-llama3-from-scratch by therealoliver

Llama3 inference walkthrough, step-by-step

Created 10 months ago

617 stars

Top 53.5% on SourcePulse

Project Summary

This repository provides a step-by-step, from-scratch implementation of the Llama 3 inference process, targeting engineers and researchers who want to deeply understand the model's mechanics. It offers detailed code annotations, dimension tracking, and principle explanations, including a dedicated section on KV-Cache, to facilitate a thorough grasp of Llama 3's architecture and operation.

How It Works

The project meticulously reconstructs Llama 3's inference pipeline, breaking down each component. It starts with tokenization and embedding, then details the RMS normalization, Rotary Position Encoding (RoPE) for positional information, and the multi-head attention mechanism with Grouped Query Attention (GQA). The implementation covers the Feed-Forward Network (FFN) with SwiGLU activation and residual connections, culminating in the final prediction layer. The approach emphasizes clarity through extensive inline comments and explicit dimension tracking at each step.

Quick Start & Requirements

Install: Clone the repository and install dependencies via pip install -r requirements.txt.
Prerequisites: Python 3.8+, PyTorch, Transformers, tiktoken, regex, matplotlib. Requires downloading Llama 3 8B model weights from Meta.
Setup: Download Llama 3 8B weights to the Meta-Llama-3-8B/original/ directory.
Run: Execute Python scripts for each step (e.g., load_model.py, attention.py).
Docs: Project Repository

Highlighted Details

Step-by-step implementation of Llama 3's Transformer blocks.
Detailed explanation and implementation of Rotary Position Encoding (RoPE).
In-depth breakdown of Grouped Query Attention (GQA) and KV-Cache.
Bilingual (Chinese/English) documentation and code comments.

Maintenance & Community

The project is based on naklecha/llama3-from-scratch and has been significantly enhanced. It appears to be a personal project with contributions from the original author. Community interaction channels are not explicitly mentioned.

Licensing & Compatibility

The repository is licensed under the MIT License. This license is permissive and allows for commercial use and integration into closed-source projects.

Limitations & Caveats

This project focuses solely on inference and does not include training code. It requires manual download of model weights, and the code is structured for educational purposes rather than production deployment.

Deepdive-llama3-from-scratch by therealoliver

Explore Similar Projects

llama-nuts-and-bolts by adalkiran

Yi-1.5 by 01-ai

llm_qlora by georgesung

llama-from-scratch by bkitano

pytorch-llama by hkproj

huggingface-llama-recipes by huggingface

llama3-from-scratch-zh by wdndev

EAGLE by SafeAILab

llama3-Chinese-chat by CrazyBoyM

one-small-step by karminski

codellama by meta-llama

llama by meta-llama