llama3-from-scratch  by naklecha

Llama3 implementation from scratch, one matrix multiplication at a time

Created 1 year ago
15,243 stars

Top 3.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a step-by-step, line-by-line implementation of Meta's Llama 3 8B model from scratch in a single Python file. It's designed for researchers and engineers who want a deep, practical understanding of transformer architectures and LLM internals, enabling them to dissect and potentially modify core components.

How It Works

The project meticulously reconstructs the Llama 3 architecture by loading official Meta weights and processing them through fundamental PyTorch operations. It details each stage: tokenization using tiktoken, RMS normalization, RoPE positional embeddings implemented via complex number multiplication, multi-head attention (including weight unwrapping and KV cache sharing), and the SwiGLU feed-forward network. The implementation emphasizes clarity over optimization, showing how each matrix multiplication contributes to the overall model forward pass.

Quick Start & Requirements

  • Install: Requires PyTorch.
  • Prerequisites: Download Llama 3 8B weights from Meta's official download page.
  • Execution: Run the provided Python script.
  • Resources: Requires sufficient RAM to load model weights.

Highlighted Details

  • Single-file implementation for maximum transparency.
  • Detailed explanation of RoPE positional encoding using complex numbers.
  • Step-by-step breakdown of multi-head attention, including KV head sharing.
  • Implementation of the SwiGLU activation in the feed-forward network.
  • Demonstrates how to load and interpret model parameters (params.json) and weights (consolidated.00.pth).

Maintenance & Community

The project is a personal endeavor by naklecha. There are no explicit community channels or roadmap mentioned.

Licensing & Compatibility

The repository itself does not specify a license. The use of Llama 3 weights is subject to Meta's Llama 3 license.

Limitations & Caveats

This implementation is purely educational and does not include optimizations for inference speed or memory efficiency. It focuses on demonstrating the core mechanics rather than production-ready deployment. The project is presented as a single script, implying limited modularity for further development.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
32 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
17 more.

open_llama by openlm-research

0.1%
8k
Open-source reproduction of LLaMA models
Created 2 years ago
Updated 2 years ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
21 more.

TinyLlama by jzhang38

0.1%
9k
Tiny pretraining project for a 1.1B Llama model
Created 2 years ago
Updated 1 year ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Romain Huet Romain Huet(Head of Developer Experience at OpenAI).

llama-models by meta-llama

0.1%
8k
Utilities for Llama models
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.