nanoGPT  by karpathy

Minimalist repo for training/finetuning GPT models

created 2 years ago
43,292 stars

Top 0.6% on sourcepulse

GitHubView on GitHub
Project Summary

nanoGPT provides a minimalist and efficient codebase for training and fine-tuning medium-sized GPT models. It's designed for researchers and practitioners who need a simple, hackable implementation to experiment with GPT architectures, reproduce existing results, or train custom language models. The project emphasizes clarity and performance, allowing users to train models from scratch or fine-tune pre-trained checkpoints.

How It Works

nanoGPT is built around a core set of Python files: train.py for the training loop and model.py for the GPT model definition. It leverages PyTorch for its deep learning operations and includes optional integrations with libraries like transformers and datasets for loading pre-trained weights and data. The design prioritizes readability, with the training loop and model definition being concise (~300 lines each), making it easy to understand and modify. It supports PyTorch 2.0's torch.compile for significant speedups.

Quick Start & Requirements

  • Install: pip install torch numpy transformers datasets tiktoken wandb tqdm
  • Prerequisites: PyTorch, NumPy, Hugging Face Transformers and Datasets (optional), OpenAI's tiktoken, Weights & Biases (optional), tqdm.
  • Hardware: Can run on a single GPU (e.g., A100) for larger models or even a CPU for smaller experiments. Apple Silicon GPUs are supported via --device=mps.
  • Resources: Training GPT-2 (124M) on OpenWebText takes ~4 days on an 8x A100 node. Character-level Shakespeare training takes ~3 minutes on a single GPU.
  • Docs: Zero To Hero series, Discord

Highlighted Details

  • Reproduces GPT-2 (124M) on OpenWebText in ~4 days on an 8x A100 node.
  • Achieves ~2.85 validation loss when fine-tuning GPT-2 (124M) on OpenWebText.
  • Supports training from scratch, fine-tuning, and sampling from models.
  • Offers configuration files for various model sizes and datasets (e.g., Shakespeare, OpenWebText).

Maintenance & Community

The project is actively developed by Andrej Karpathy. It is sponsored by Lambda Labs. Discussions and support are available on Discord.

Licensing & Compatibility

MIT License. Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is under active development, and some features like FSDP integration are still pending. PyTorch 2.0's torch.compile is experimental and may not be available on all platforms (e.g., Windows), requiring the --compile=False flag in such cases.

Health Check
Last commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
2
Star History
2,498 stars in the last 90 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
5 more.

gpt-neo by EleutherAI

0.0%
8k
GPT-2/3-style model implementation using mesh-tensorflow
created 5 years ago
updated 3 years ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 21 hours ago
Feedback? Help us improve.