Minimalist repo for training/finetuning GPT models
Top 0.6% on sourcepulse
nanoGPT provides a minimalist and efficient codebase for training and fine-tuning medium-sized GPT models. It's designed for researchers and practitioners who need a simple, hackable implementation to experiment with GPT architectures, reproduce existing results, or train custom language models. The project emphasizes clarity and performance, allowing users to train models from scratch or fine-tune pre-trained checkpoints.
How It Works
nanoGPT is built around a core set of Python files: train.py
for the training loop and model.py
for the GPT model definition. It leverages PyTorch for its deep learning operations and includes optional integrations with libraries like transformers
and datasets
for loading pre-trained weights and data. The design prioritizes readability, with the training loop and model definition being concise (~300 lines each), making it easy to understand and modify. It supports PyTorch 2.0's torch.compile
for significant speedups.
Quick Start & Requirements
pip install torch numpy transformers datasets tiktoken wandb tqdm
--device=mps
.Highlighted Details
Maintenance & Community
The project is actively developed by Andrej Karpathy. It is sponsored by Lambda Labs. Discussions and support are available on Discord.
Licensing & Compatibility
MIT License. Permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
The project is under active development, and some features like FSDP integration are still pending. PyTorch 2.0's torch.compile
is experimental and may not be available on all platforms (e.g., Windows), requiring the --compile=False
flag in such cases.
7 months ago
Inactive