llama3-from-scratch by naklecha

Llama3 implementation from scratch, one matrix multiplication at a time

Created 1 year ago

15,232 stars

Top 3.2% on SourcePulse

View on GitHub

9 Experts Love This Project

Toran Bruce Richards

Founder of AutoGPT

Simon Mo

Core Maintainer of vLLM

Alex Cheema

Cofounder of EXO Labs

Vincent Weisser

Cofounder of Prime Intellect

and 5 more!

Project Summary

This repository provides a step-by-step, line-by-line implementation of Meta's Llama 3 8B model from scratch in a single Python file. It's designed for researchers and engineers who want a deep, practical understanding of transformer architectures and LLM internals, enabling them to dissect and potentially modify core components.

How It Works

The project meticulously reconstructs the Llama 3 architecture by loading official Meta weights and processing them through fundamental PyTorch operations. It details each stage: tokenization using tiktoken, RMS normalization, RoPE positional embeddings implemented via complex number multiplication, multi-head attention (including weight unwrapping and KV cache sharing), and the SwiGLU feed-forward network. The implementation emphasizes clarity over optimization, showing how each matrix multiplication contributes to the overall model forward pass.

Quick Start & Requirements

Install: Requires PyTorch.
Prerequisites: Download Llama 3 8B weights from Meta's official download page.
Execution: Run the provided Python script.
Resources: Requires sufficient RAM to load model weights.

Highlighted Details

Single-file implementation for maximum transparency.
Detailed explanation of RoPE positional encoding using complex numbers.
Step-by-step breakdown of multi-head attention, including KV head sharing.
Implementation of the SwiGLU activation in the feed-forward network.
Demonstrates how to load and interpret model parameters (params.json) and weights (consolidated.00.pth).

Maintenance & Community

The project is a personal endeavor by naklecha. There are no explicit community channels or roadmap mentioned.

Licensing & Compatibility

The repository itself does not specify a license. The use of Llama 3 weights is subject to Meta's Llama 3 license.

Limitations & Caveats

This implementation is purely educational and does not include optimizations for inference speed or memory efficiency. It focuses on demonstrating the core mechanics rather than production-ready deployment. The project is presented as a single script, implying limited modularity for further development.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

50 stars in the last 30 days