minGPT by karpathy

Minimal PyTorch re-implementation for GPT training and inference

Created 5 years ago

23,271 stars

Top 1.8% on SourcePulse

View on GitHub

35 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Nat Friedman

Former CEO of GitHub

George Hotz

Author of tinygrad; Founder of the tiny corp, comma.ai

Lilian Weng

Cofounder of Thinking Machines Lab

and 31 more!

Project Summary

minGPT provides a minimal, educational PyTorch implementation of OpenAI's GPT architecture, suitable for researchers and developers seeking to understand or build upon transformer-based language models. It offers a clean, ~300-line core model implementation for both training and inference, simplifying the complex details often found in larger frameworks.

How It Works

minGPT implements a decoder-only Transformer architecture. It processes sequences of token indices through self-attention and feed-forward layers, outputting probability distributions for the next token. The implementation emphasizes efficient batching over sequence length and examples, a key complexity in optimizing Transformer performance. It includes a Byte Pair Encoding (BPE) tokenizer matching OpenAI's GPT implementation.

Quick Start & Requirements

Install: git clone https://github.com/karpathy/minGPT.git followed by pip install -e .
Prerequisites: PyTorch. Specific model configurations (e.g., GPT-2) require matching vocab_size and block_size.
Demo: demo.ipynb and generate.ipynb provide examples.
Official Docs: https://github.com/karpathy/minGPT

Highlighted Details

Minimalist design: Core model implementation is ~300 lines of PyTorch.
Educational focus: Aims for clarity and interpretability.
Reproduces OpenAI BPE: Includes a Byte Pair Encoder matching GPT implementations.
Example projects: Demonstrates training for tasks like arithmetic (projects/adder) and character-level language modeling (projects/chargpt).

Maintenance & Community

The project is in a semi-archived state as of Jan 2023, with the author recommending nanoGPT for more recent developments. Contributions are still accepted, but major changes are unlikely.

Licensing & Compatibility

MIT License. Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is noted as being in a semi-archived state, with the author recommending nanoGPT for more active development and features. Unit test coverage is not comprehensive. The README mentions a lack of a requirements.txt file.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

173 stars in the last 30 days