Discover and explore top open-source AI tools and projects—updated daily.
1779387745Fastest, simplest GPT training and finetuning
Top 99.4% on SourcePulse
Summary
nanoGPT provides a minimalist and highly efficient codebase for training and fine-tuning medium-sized Generative Pre-trained Transformer (GPT) models. It targets researchers and engineers seeking a straightforward, hackable implementation to rapidly iterate on GPT architectures, offering significant performance gains through modern PyTorch features.
How It Works
This project prioritizes functional simplicity and performance over extensive documentation, featuring a core training loop (train.py) and GPT model definition (model.py) each around 300 lines. It leverages PyTorch 2.0's torch.compile for substantial speedups in training iterations. The architecture supports training from scratch or fine-tuning existing checkpoints, with a clear data flow for tokenization and model execution.
Quick Start & Requirements
pip install torch numpy transformers datasets tiktoken wandb tqdm#nanoGPTHighlighted Details
torch.compile, reducing iteration times.Maintenance & Community
The project is noted as "still under active development" with a list of future improvements ("todos"). Community discussions and support are available via the #nanoGPT channel on Discord. The project acknowledges sponsorship from Lambda labs.
Licensing & Compatibility
The provided README does not specify a software license. This omission presents a significant adoption blocker, as the terms for use, modification, and distribution are unclear. Compatibility for commercial use or linking with closed-source projects cannot be determined without a license.
Limitations & Caveats
The repository is explicitly "still under active development," indicating potential for breaking changes and incomplete features (e.g., FSDP integration is listed as a todo). The reliance on PyTorch 2.0 torch.compile may introduce experimental behavior or platform-specific issues (e.g., noted problems on Windows). The codebase prioritizes simplicity, potentially omitting advanced features found in larger frameworks.
3 months ago
Inactive
ConnorJL
karpathy
karpathy