llama3-from-scratch  by naklecha

Llama3 implementation from scratch, one matrix multiplication at a time

created 1 year ago
15,060 stars

Top 3.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a step-by-step, line-by-line implementation of Meta's Llama 3 8B model from scratch in a single Python file. It's designed for researchers and engineers who want a deep, practical understanding of transformer architectures and LLM internals, enabling them to dissect and potentially modify core components.

How It Works

The project meticulously reconstructs the Llama 3 architecture by loading official Meta weights and processing them through fundamental PyTorch operations. It details each stage: tokenization using tiktoken, RMS normalization, RoPE positional embeddings implemented via complex number multiplication, multi-head attention (including weight unwrapping and KV cache sharing), and the SwiGLU feed-forward network. The implementation emphasizes clarity over optimization, showing how each matrix multiplication contributes to the overall model forward pass.

Quick Start & Requirements

  • Install: Requires PyTorch.
  • Prerequisites: Download Llama 3 8B weights from Meta's official download page.
  • Execution: Run the provided Python script.
  • Resources: Requires sufficient RAM to load model weights.

Highlighted Details

  • Single-file implementation for maximum transparency.
  • Detailed explanation of RoPE positional encoding using complex numbers.
  • Step-by-step breakdown of multi-head attention, including KV head sharing.
  • Implementation of the SwiGLU activation in the feed-forward network.
  • Demonstrates how to load and interpret model parameters (params.json) and weights (consolidated.00.pth).

Maintenance & Community

The project is a personal endeavor by naklecha. There are no explicit community channels or roadmap mentioned.

Licensing & Compatibility

The repository itself does not specify a license. The use of Llama 3 weights is subject to Meta's Llama 3 license.

Limitations & Caveats

This implementation is purely educational and does not include optimizations for inference speed or memory efficiency. It focuses on demonstrating the core mechanics rather than production-ready deployment. The project is presented as a single script, implying limited modularity for further development.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
220 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.