mlx-gpt2  by pranavjad

Minimal GPT-2 implementation for educational purposes

created 1 year ago
393 stars

Top 74.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a concise, ~200-line Python implementation of the GPT-2 architecture from scratch using the MLX framework. It's designed for educational purposes, allowing users to understand the core components of a transformer-based language model by building and training it on a small dataset. The primary benefit is a clear, step-by-step guide to implementing a foundational LLM.

How It Works

The implementation follows the standard GPT-2 architecture, starting with character-level tokenization of Shakespearean text. It builds the model layer by layer: input embeddings, positional embeddings, multi-head self-attention (with causal masking), feed-forward networks (MLPs), layer normalization, and skip connections. MLX's tensor operations and automatic differentiation are leveraged throughout for efficient computation and gradient calculation. The training loop uses AdamW optimization and cross-entropy loss.

Quick Start & Requirements

  • Install: pip install mlx numpy
  • Data: Requires input.txt containing training text (e.g., Shakespeare).
  • Run: Execute the train.py script.
  • Hardware: Optimized for Apple Silicon (MLX). Training on a MacBook takes approximately 10 minutes.
  • Docs: Detailed explanations are within the README.

Highlighted Details

  • Character-level tokenization for simplicity.
  • Explicit implementation of multi-head attention using mx.as_strided for efficiency.
  • Custom weight initialization following GPT-2 paper recommendations for residual connections.
  • Generation function to produce text samples after training.

Maintenance & Community

This appears to be a personal project by pranavjad, focused on a specific educational goal. There's no indication of ongoing community development or formal maintenance.

Licensing & Compatibility

The repository does not explicitly state a license. Based on the content and typical open-source practices, it's likely intended for educational and non-commercial use. Compatibility with commercial or closed-source projects would require clarification.

Limitations & Caveats

The model is a highly simplified version of GPT-2, trained on a small dataset. The character-level tokenization and small model size result in generated text that is coherent in form but nonsensical. It is not intended for practical NLP tasks but rather as a learning tool.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.