nanoGPT  by karpathy

Minimalist repo for training/finetuning GPT models

Created 2 years ago
44,451 stars

Top 0.6% on SourcePulse

GitHubView on GitHub
Project Summary

nanoGPT provides a minimalist and efficient codebase for training and fine-tuning medium-sized GPT models. It's designed for researchers and practitioners who need a simple, hackable implementation to experiment with GPT architectures, reproduce existing results, or train custom language models. The project emphasizes clarity and performance, allowing users to train models from scratch or fine-tune pre-trained checkpoints.

How It Works

nanoGPT is built around a core set of Python files: train.py for the training loop and model.py for the GPT model definition. It leverages PyTorch for its deep learning operations and includes optional integrations with libraries like transformers and datasets for loading pre-trained weights and data. The design prioritizes readability, with the training loop and model definition being concise (~300 lines each), making it easy to understand and modify. It supports PyTorch 2.0's torch.compile for significant speedups.

Quick Start & Requirements

  • Install: pip install torch numpy transformers datasets tiktoken wandb tqdm
  • Prerequisites: PyTorch, NumPy, Hugging Face Transformers and Datasets (optional), OpenAI's tiktoken, Weights & Biases (optional), tqdm.
  • Hardware: Can run on a single GPU (e.g., A100) for larger models or even a CPU for smaller experiments. Apple Silicon GPUs are supported via --device=mps.
  • Resources: Training GPT-2 (124M) on OpenWebText takes ~4 days on an 8x A100 node. Character-level Shakespeare training takes ~3 minutes on a single GPU.
  • Docs: Zero To Hero series, Discord

Highlighted Details

  • Reproduces GPT-2 (124M) on OpenWebText in ~4 days on an 8x A100 node.
  • Achieves ~2.85 validation loss when fine-tuning GPT-2 (124M) on OpenWebText.
  • Supports training from scratch, fine-tuning, and sampling from models.
  • Offers configuration files for various model sizes and datasets (e.g., Shakespeare, OpenWebText).

Maintenance & Community

The project is actively developed by Andrej Karpathy. It is sponsored by Lambda Labs. Discussions and support are available on Discord.

Licensing & Compatibility

MIT License. Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is under active development, and some features like FSDP integration are still pending. PyTorch 2.0's torch.compile is experimental and may not be available on all platforms (e.g., Windows), requiring the --compile=False flag in such cases.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
5
Star History
836 stars in the last 30 days

Explore Similar Projects

Starred by Victor Taelin Victor Taelin(Author of Bend, Kind, HVM), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
2 more.

nanoT5 by PiotrNawrot

0.2%
1k
PyTorch code for T5 pre-training and fine-tuning on a single GPU
Created 2 years ago
Updated 1 year ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Casper Hansen Casper Hansen(Author of AutoAWQ), and
1 more.

GPT2 by ConnorJL

0%
1k
GPT2 training implementation, supporting TPUs and GPUs
Created 6 years ago
Updated 2 years ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
20 more.

accelerate by huggingface

0.3%
9k
PyTorch training helper for distributed execution
Created 4 years ago
Updated 1 day ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Alexey Milovidov Alexey Milovidov(Cofounder of Clickhouse), and
29 more.

llm.c by karpathy

0.2%
28k
LLM training in pure C/CUDA, no PyTorch needed
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.