llama3-from-scratch  by naklecha

Llama3 implementation from scratch, one matrix multiplication at a time

Created 1 year ago
15,149 stars

Top 3.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a step-by-step, line-by-line implementation of Meta's Llama 3 8B model from scratch in a single Python file. It's designed for researchers and engineers who want a deep, practical understanding of transformer architectures and LLM internals, enabling them to dissect and potentially modify core components.

How It Works

The project meticulously reconstructs the Llama 3 architecture by loading official Meta weights and processing them through fundamental PyTorch operations. It details each stage: tokenization using tiktoken, RMS normalization, RoPE positional embeddings implemented via complex number multiplication, multi-head attention (including weight unwrapping and KV cache sharing), and the SwiGLU feed-forward network. The implementation emphasizes clarity over optimization, showing how each matrix multiplication contributes to the overall model forward pass.

Quick Start & Requirements

  • Install: Requires PyTorch.
  • Prerequisites: Download Llama 3 8B weights from Meta's official download page.
  • Execution: Run the provided Python script.
  • Resources: Requires sufficient RAM to load model weights.

Highlighted Details

  • Single-file implementation for maximum transparency.
  • Detailed explanation of RoPE positional encoding using complex numbers.
  • Step-by-step breakdown of multi-head attention, including KV head sharing.
  • Implementation of the SwiGLU activation in the feed-forward network.
  • Demonstrates how to load and interpret model parameters (params.json) and weights (consolidated.00.pth).

Maintenance & Community

The project is a personal endeavor by naklecha. There are no explicit community channels or roadmap mentioned.

Licensing & Compatibility

The repository itself does not specify a license. The use of Llama 3 weights is subject to Meta's Llama 3 license.

Limitations & Caveats

This implementation is purely educational and does not include optimizations for inference speed or memory efficiency. It focuses on demonstrating the core mechanics rather than production-ready deployment. The project is presented as a single script, implying limited modularity for further development.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
57 stars in the last 30 days

Explore Similar Projects

Starred by Ross Wightman Ross Wightman(Author of timm; CV at Hugging Face), Awni Hannun Awni Hannun(Author of MLX; Research Scientist at Apple), and
1 more.

mlx-llm by riccardomusmeci

0%
454
LLM tools/apps for Apple Silicon using MLX
Created 1 year ago
Updated 7 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
17 more.

open_llama by openlm-research

0.1%
8k
Open-source reproduction of LLaMA models
Created 2 years ago
Updated 2 years ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
20 more.

TinyLlama by jzhang38

0.1%
9k
Tiny pretraining project for a 1.1B Llama model
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.