llama-from-scratch by bkitano

Educational resource for implementing Llama from scratch

Created 2 years ago

582 stars

Top 55.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Elvis Saravia

Founder of DAIR.AI

Project Summary

This repository provides a step-by-step guide to implementing the Llama transformer architecture from scratch using PyTorch. It targets engineers and researchers interested in understanding and replicating Llama's core components, offering a practical, educational approach to building a language model.

How It Works

The project breaks down the Llama architecture into its key components: RMSNorm, Rotary Embeddings (RoPE), and SwiGLU activation. It starts with a simple feed-forward network, iteratively adding and testing each Llama-specific modification. The implementation emphasizes testing individual layers and components with small datasets (TinyShakespeare) and clear evaluation metrics to ensure correctness before integrating them into the full model.

Quick Start & Requirements

Install: Standard Python environment with PyTorch.
Prerequisites: Python 3.x, PyTorch, NumPy, Matplotlib, Pandas. No specific hardware (GPU) is strictly required for basic execution, but recommended for faster training.
Setup: Clone the repository and run the provided Python scripts. The README includes runnable code snippets for each step.
Links: The project is heavily inspired by Andrej Karpathy's "makemore" series.

Highlighted Details

Iterative Implementation: Builds the model piece-by-piece, testing each component.
Educational Focus: Explains the rationale behind architectural choices (RMSNorm vs. BatchNorm, RoPE, SwiGLU).
Debugging Insights: Demonstrates how to debug tensor shapes and gradient flow.
Performance: Achieves a validation loss of ~1.08 on TinyShakespeare with a 4-layer model.

Maintenance & Community

The project is a personal educational effort by bkitano, inspired by Andrej Karpathy. There are no explicit community channels or roadmap mentioned.

Licensing & Compatibility

The repository does not explicitly state a license. The code is presented as educational material.

Limitations & Caveats

The implementation uses a character-level tokenizer and a significantly smaller dataset (TinyShakespeare) than the original Llama. The learning rate schedule from the original Llama paper did not perform as expected in this implementation.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days