minGPT  by karpathy

Minimal PyTorch re-implementation for GPT training and inference

created 5 years ago
22,361 stars

Top 1.9% on sourcepulse

GitHubView on GitHub
Project Summary

minGPT provides a minimal, educational PyTorch implementation of OpenAI's GPT architecture, suitable for researchers and developers seeking to understand or build upon transformer-based language models. It offers a clean, ~300-line core model implementation for both training and inference, simplifying the complex details often found in larger frameworks.

How It Works

minGPT implements a decoder-only Transformer architecture. It processes sequences of token indices through self-attention and feed-forward layers, outputting probability distributions for the next token. The implementation emphasizes efficient batching over sequence length and examples, a key complexity in optimizing Transformer performance. It includes a Byte Pair Encoding (BPE) tokenizer matching OpenAI's GPT implementation.

Quick Start & Requirements

  • Install: git clone https://github.com/karpathy/minGPT.git followed by pip install -e .
  • Prerequisites: PyTorch. Specific model configurations (e.g., GPT-2) require matching vocab_size and block_size.
  • Demo: demo.ipynb and generate.ipynb provide examples.
  • Official Docs: https://github.com/karpathy/minGPT

Highlighted Details

  • Minimalist design: Core model implementation is ~300 lines of PyTorch.
  • Educational focus: Aims for clarity and interpretability.
  • Reproduces OpenAI BPE: Includes a Byte Pair Encoder matching GPT implementations.
  • Example projects: Demonstrates training for tasks like arithmetic (projects/adder) and character-level language modeling (projects/chargpt).

Maintenance & Community

The project is in a semi-archived state as of Jan 2023, with the author recommending nanoGPT for more recent developments. Contributions are still accepted, but major changes are unlikely.

Licensing & Compatibility

MIT License. Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is noted as being in a semi-archived state, with the author recommending nanoGPT for more active development and features. Unit test coverage is not comprehensive. The README mentions a lack of a requirements.txt file.

Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
603 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.