mlx-gpt2 by pranavjad

Minimal GPT-2 implementation for educational purposes

Created 1 year ago

411 stars

Top 71.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Project Summary

This repository provides a concise, ~200-line Python implementation of the GPT-2 architecture from scratch using the MLX framework. It's designed for educational purposes, allowing users to understand the core components of a transformer-based language model by building and training it on a small dataset. The primary benefit is a clear, step-by-step guide to implementing a foundational LLM.

How It Works

The implementation follows the standard GPT-2 architecture, starting with character-level tokenization of Shakespearean text. It builds the model layer by layer: input embeddings, positional embeddings, multi-head self-attention (with causal masking), feed-forward networks (MLPs), layer normalization, and skip connections. MLX's tensor operations and automatic differentiation are leveraged throughout for efficient computation and gradient calculation. The training loop uses AdamW optimization and cross-entropy loss.

Quick Start & Requirements

Install: pip install mlx numpy
Data: Requires input.txt containing training text (e.g., Shakespeare).
Run: Execute the train.py script.
Hardware: Optimized for Apple Silicon (MLX). Training on a MacBook takes approximately 10 minutes.
Docs: Detailed explanations are within the README.

Highlighted Details

Character-level tokenization for simplicity.
Explicit implementation of multi-head attention using mx.as_strided for efficiency.
Custom weight initialization following GPT-2 paper recommendations for residual connections.
Generation function to produce text samples after training.

Maintenance & Community

This appears to be a personal project by pranavjad, focused on a specific educational goal. There's no indication of ongoing community development or formal maintenance.

Licensing & Compatibility

The repository does not explicitly state a license. Based on the content and typical open-source practices, it's likely intended for educational and non-commercial use. Compatibility with commercial or closed-source projects would require clarification.

Limitations & Caveats

The model is a highly simplified version of GPT-2, trained on a small dataset. The character-level tokenization and small model size result in generated text that is coherent in form but nonsensical. It is not intended for practical NLP tasks but rather as a learning tool.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days