mlx-gpt2  by pranavjad

Minimal GPT-2 implementation for educational purposes

Created 1 year ago
397 stars

Top 72.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a concise, ~200-line Python implementation of the GPT-2 architecture from scratch using the MLX framework. It's designed for educational purposes, allowing users to understand the core components of a transformer-based language model by building and training it on a small dataset. The primary benefit is a clear, step-by-step guide to implementing a foundational LLM.

How It Works

The implementation follows the standard GPT-2 architecture, starting with character-level tokenization of Shakespearean text. It builds the model layer by layer: input embeddings, positional embeddings, multi-head self-attention (with causal masking), feed-forward networks (MLPs), layer normalization, and skip connections. MLX's tensor operations and automatic differentiation are leveraged throughout for efficient computation and gradient calculation. The training loop uses AdamW optimization and cross-entropy loss.

Quick Start & Requirements

  • Install: pip install mlx numpy
  • Data: Requires input.txt containing training text (e.g., Shakespeare).
  • Run: Execute the train.py script.
  • Hardware: Optimized for Apple Silicon (MLX). Training on a MacBook takes approximately 10 minutes.
  • Docs: Detailed explanations are within the README.

Highlighted Details

  • Character-level tokenization for simplicity.
  • Explicit implementation of multi-head attention using mx.as_strided for efficiency.
  • Custom weight initialization following GPT-2 paper recommendations for residual connections.
  • Generation function to produce text samples after training.

Maintenance & Community

This appears to be a personal project by pranavjad, focused on a specific educational goal. There's no indication of ongoing community development or formal maintenance.

Licensing & Compatibility

The repository does not explicitly state a license. Based on the content and typical open-source practices, it's likely intended for educational and non-commercial use. Compatibility with commercial or closed-source projects would require clarification.

Limitations & Caveats

The model is a highly simplified version of GPT-2, trained on a small dataset. The character-level tokenization and small model size result in generated text that is coherent in form but nonsensical. It is not intended for practical NLP tasks but rather as a learning tool.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Omar Khattab Omar Khattab(Coauthor of DSPy, ColBERT; Professor at MIT), and
15 more.

gpt-neo by EleutherAI

0.0%
8k
GPT-2/3-style model implementation using mesh-tensorflow
Created 5 years ago
Updated 3 years ago
Feedback? Help us improve.