llama-from-scratch  by bkitano

Educational resource for implementing Llama from scratch

created 2 years ago
573 stars

Top 57.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a step-by-step guide to implementing the Llama transformer architecture from scratch using PyTorch. It targets engineers and researchers interested in understanding and replicating Llama's core components, offering a practical, educational approach to building a language model.

How It Works

The project breaks down the Llama architecture into its key components: RMSNorm, Rotary Embeddings (RoPE), and SwiGLU activation. It starts with a simple feed-forward network, iteratively adding and testing each Llama-specific modification. The implementation emphasizes testing individual layers and components with small datasets (TinyShakespeare) and clear evaluation metrics to ensure correctness before integrating them into the full model.

Quick Start & Requirements

  • Install: Standard Python environment with PyTorch.
  • Prerequisites: Python 3.x, PyTorch, NumPy, Matplotlib, Pandas. No specific hardware (GPU) is strictly required for basic execution, but recommended for faster training.
  • Setup: Clone the repository and run the provided Python scripts. The README includes runnable code snippets for each step.
  • Links: The project is heavily inspired by Andrej Karpathy's "makemore" series.

Highlighted Details

  • Iterative Implementation: Builds the model piece-by-piece, testing each component.
  • Educational Focus: Explains the rationale behind architectural choices (RMSNorm vs. BatchNorm, RoPE, SwiGLU).
  • Debugging Insights: Demonstrates how to debug tensor shapes and gradient flow.
  • Performance: Achieves a validation loss of ~1.08 on TinyShakespeare with a 4-layer model.

Maintenance & Community

The project is a personal educational effort by bkitano, inspired by Andrej Karpathy. There are no explicit community channels or roadmap mentioned.

Licensing & Compatibility

The repository does not explicitly state a license. The code is presented as educational material.

Limitations & Caveats

The implementation uses a character-level tokenizer and a significantly smaller dataset (TinyShakespeare) than the original Llama. The learning rate schedule from the original Llama paper did not perform as expected in this implementation.

Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.