nanoGPT by karpathy

Minimalist repo for training/finetuning GPT models

Created 3 years ago

51,828 stars

Top 0.5% on SourcePulse

View on GitHub

48 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Daniel Gross

Cofounder of Safe Superintelligence

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Andrew Kane

Author of pgvector

and 44 more!

Project Summary

nanoGPT provides a minimalist and efficient codebase for training and fine-tuning medium-sized GPT models. It's designed for researchers and practitioners who need a simple, hackable implementation to experiment with GPT architectures, reproduce existing results, or train custom language models. The project emphasizes clarity and performance, allowing users to train models from scratch or fine-tune pre-trained checkpoints.

How It Works

nanoGPT is built around a core set of Python files: train.py for the training loop and model.py for the GPT model definition. It leverages PyTorch for its deep learning operations and includes optional integrations with libraries like transformers and datasets for loading pre-trained weights and data. The design prioritizes readability, with the training loop and model definition being concise (~300 lines each), making it easy to understand and modify. It supports PyTorch 2.0's torch.compile for significant speedups.

Quick Start & Requirements

Install: pip install torch numpy transformers datasets tiktoken wandb tqdm
Prerequisites: PyTorch, NumPy, Hugging Face Transformers and Datasets (optional), OpenAI's tiktoken, Weights & Biases (optional), tqdm.
Hardware: Can run on a single GPU (e.g., A100) for larger models or even a CPU for smaller experiments. Apple Silicon GPUs are supported via --device=mps.
Resources: Training GPT-2 (124M) on OpenWebText takes ~4 days on an 8x A100 node. Character-level Shakespeare training takes ~3 minutes on a single GPU.
Docs: Zero To Hero series, Discord

Highlighted Details

Reproduces GPT-2 (124M) on OpenWebText in ~4 days on an 8x A100 node.
Achieves ~2.85 validation loss when fine-tuning GPT-2 (124M) on OpenWebText.
Supports training from scratch, fine-tuning, and sampling from models.
Offers configuration files for various model sizes and datasets (e.g., Shakespeare, OpenWebText).

Maintenance & Community

The project is actively developed by Andrej Karpathy. It is sponsored by Lambda Labs. Discussions and support are available on Discord.

Licensing & Compatibility

MIT License. Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is under active development, and some features like FSDP integration are still pending. PyTorch 2.0's torch.compile is experimental and may not be available on all platforms (e.g., Windows), requiring the --compile=False flag in such cases.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1,029 stars in the last 30 days