Minimal PyTorch LLM for educational training
Top 29.3% on sourcepulse
SMOL-GPT provides a minimal, educational PyTorch implementation for training small Large Language Models (LLMs) from scratch. It targets researchers and developers interested in understanding LLM internals, offering features like Flash Attention, RMSNorm, SwiGLU, and modern sampling techniques for efficient training.
How It Works
This project implements a GPT model architecture using pure PyTorch, minimizing abstraction overhead. It incorporates modern LLM components such as Flash Attention (when available), RMSNorm, SwiGLU activations, and Rotary Positional Embeddings (RoPE) for improved performance and efficiency. Training supports mixed precision (bfloat16/float16), gradient accumulation, learning rate decay with warmup, and gradient clipping.
Quick Start & Requirements
pip install -r requirements.txt
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README notes that this implementation is primarily for educational purposes and suggests scaling up model size and dataset for production use.
5 months ago
1 day