build-nanogpt  by karpathy

Educational resource for building nanoGPT from scratch

created 1 year ago
4,249 stars

Top 11.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a step-by-step, from-scratch reproduction of the nanoGPT language model, accompanied by a video lecture. It targets developers and researchers interested in understanding and replicating GPT-2 (124M) architecture and training, offering a cost-effective and time-efficient path to building foundational language models.

How It Works

The project meticulously recreates the GPT-2 (124M) model using clean, incremental Git commits, allowing users to trace the development process. It focuses on the core language modeling task, training on internet documents to predict the next token. The approach emphasizes clarity and educational value, making complex concepts accessible.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.x, PyTorch, NumPy. GPU recommended for training (e.g., Lambda Labs).
  • Training: Reproducing GPT-2 (124M) takes approximately 1 hour and ~$10 on a cloud GPU.
  • Resources: Official YouTube lecture: https://www.youtube.com/watch?v=kCc8FmEb1nY

Highlighted Details

  • Step-by-step Git commits for clear learning.
  • Reproduces GPT-2 (124M) model training.
  • Cost-effective training (~$10) and time-efficient (~1 hour).
  • Focuses on foundational language modeling, not chat fine-tuning.

Maintenance & Community

  • Active community discussions via GitHub Discussions and Zero To Hero Discord (#nanoGPT channel).
  • Errata and fixes are addressed in the repository.

Licensing & Compatibility

  • License: MIT
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The project focuses solely on foundational language model training and does not cover chat fine-tuning or conversational AI capabilities. Compatibility with older PyTorch versions may require specific workarounds for type casting.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
1
Star History
199 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

mlx-gpt2 by pranavjad

0.5%
393
Minimal GPT-2 implementation for educational purposes
created 1 year ago
updated 1 year ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
5 more.

gpt-neo by EleutherAI

0.0%
8k
GPT-2/3-style model implementation using mesh-tensorflow
created 5 years ago
updated 3 years ago
Feedback? Help us improve.