Italia-  by 1779387745

Fastest, simplest GPT training and finetuning

Created 1 year ago
253 stars

Top 99.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

nanoGPT provides a minimalist and highly efficient codebase for training and fine-tuning medium-sized Generative Pre-trained Transformer (GPT) models. It targets researchers and engineers seeking a straightforward, hackable implementation to rapidly iterate on GPT architectures, offering significant performance gains through modern PyTorch features.

How It Works

This project prioritizes functional simplicity and performance over extensive documentation, featuring a core training loop (train.py) and GPT model definition (model.py) each around 300 lines. It leverages PyTorch 2.0's torch.compile for substantial speedups in training iterations. The architecture supports training from scratch or fine-tuning existing checkpoints, with a clear data flow for tokenization and model execution.

Quick Start & Requirements

  • Installation: pip install torch numpy transformers datasets tiktoken wandb tqdm
  • Prerequisites: PyTorch (GPU recommended, MPS support for Apple Silicon), NumPy, Hugging Face Transformers, Hugging Face Datasets, Tiktoken. Optional: Wandb, tqdm.
  • Setup: A character-level GPT on Shakespeare can be trained in ~3 minutes on a single GPU or ~3 minutes on CPU. Reproducing GPT-2 (124M) on OpenWebText requires an 8x A100 40GB node and approximately 4 days.
  • Links: Discord: #nanoGPT

Highlighted Details

  • Reproduces GPT-2 (124M) performance on OpenWebText in ~4 days using 8xA100 GPUs.
  • Achieves significant speedups via PyTorch 2.0 torch.compile, reducing iteration times.
  • Minimalist codebase (~600 lines for core training and model definition) enhances readability and hackability.
  • Supports training on CPU, single GPU, multi-GPU nodes, and Apple Silicon MPS.

Maintenance & Community

The project is noted as "still under active development" with a list of future improvements ("todos"). Community discussions and support are available via the #nanoGPT channel on Discord. The project acknowledges sponsorship from Lambda labs.

Licensing & Compatibility

The provided README does not specify a software license. This omission presents a significant adoption blocker, as the terms for use, modification, and distribution are unclear. Compatibility for commercial use or linking with closed-source projects cannot be determined without a license.

Limitations & Caveats

The repository is explicitly "still under active development," indicating potential for breaking changes and incomplete features (e.g., FSDP integration is listed as a todo). The reliance on PyTorch 2.0 torch.compile may introduce experimental behavior or platform-specific issues (e.g., noted problems on Windows). The codebase prioritizes simplicity, potentially omitting advanced features found in larger frameworks.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 30 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Casper Hansen Casper Hansen(Author of AutoAWQ), and
1 more.

GPT2 by ConnorJL

0%
1k
GPT2 training implementation, supporting TPUs and GPUs
Created 6 years ago
Updated 3 years ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Alexey Milovidov Alexey Milovidov(Cofounder of Clickhouse), and
29 more.

llm.c by karpathy

0.2%
29k
LLM training in pure C/CUDA, no PyTorch needed
Created 1 year ago
Updated 6 months ago
Feedback? Help us improve.