Italia- by 1779387745

Fastest, simplest GPT training and finetuning

Created 1 year ago

258 stars

Top 98.1% on SourcePulse

Project Summary

Summary

nanoGPT provides a minimalist and highly efficient codebase for training and fine-tuning medium-sized Generative Pre-trained Transformer (GPT) models. It targets researchers and engineers seeking a straightforward, hackable implementation to rapidly iterate on GPT architectures, offering significant performance gains through modern PyTorch features.

How It Works

This project prioritizes functional simplicity and performance over extensive documentation, featuring a core training loop (train.py) and GPT model definition (model.py) each around 300 lines. It leverages PyTorch 2.0's torch.compile for substantial speedups in training iterations. The architecture supports training from scratch or fine-tuning existing checkpoints, with a clear data flow for tokenization and model execution.

Quick Start & Requirements

Installation: pip install torch numpy transformers datasets tiktoken wandb tqdm
Prerequisites: PyTorch (GPU recommended, MPS support for Apple Silicon), NumPy, Hugging Face Transformers, Hugging Face Datasets, Tiktoken. Optional: Wandb, tqdm.
Setup: A character-level GPT on Shakespeare can be trained in ~3 minutes on a single GPU or ~3 minutes on CPU. Reproducing GPT-2 (124M) on OpenWebText requires an 8x A100 40GB node and approximately 4 days.
Links: Discord: #nanoGPT

Highlighted Details

Reproduces GPT-2 (124M) performance on OpenWebText in ~4 days using 8xA100 GPUs.
Achieves significant speedups via PyTorch 2.0 torch.compile, reducing iteration times.
Minimalist codebase (~600 lines for core training and model definition) enhances readability and hackability.
Supports training on CPU, single GPU, multi-GPU nodes, and Apple Silicon MPS.

Maintenance & Community

The project is noted as "still under active development" with a list of future improvements ("todos"). Community discussions and support are available via the #nanoGPT channel on Discord. The project acknowledges sponsorship from Lambda labs.

Licensing & Compatibility

The provided README does not specify a software license. This omission presents a significant adoption blocker, as the terms for use, modification, and distribution are unclear. Compatibility for commercial use or linking with closed-source projects cannot be determined without a license.

Limitations & Caveats

The repository is explicitly "still under active development," indicating potential for breaking changes and incomplete features (e.g., FSDP integration is listed as a todo). The reliance on PyTorch 2.0 torch.compile may introduce experimental behavior or platform-specific issues (e.g., noted problems on Windows). The codebase prioritizes simplicity, potentially omitting advanced features found in larger frameworks.

Italia- by 1779387745

Explore Similar Projects

Advanced-GPTs by nerority

gigaGPT by Cerebras

femtoGPT by keyvank

MINI_LLM by jiahe7ay

finetune-gpt2xl by Xirider

gpt-2-tensorflow2.0 by akanyaani

opensourceAI by thebigbone

awesome-gpt4 by radi-cho

GPT2 by ConnorJL

llm.c by karpathy

minimind by jingyaogong

nanoGPT by karpathy