smolGPT  by Om-Alve

Minimal PyTorch LLM for educational training

Created 8 months ago
1,431 stars

Top 28.5% on SourcePulse

GitHubView on GitHub
Project Summary

SMOL-GPT provides a minimal, educational PyTorch implementation for training small Large Language Models (LLMs) from scratch. It targets researchers and developers interested in understanding LLM internals, offering features like Flash Attention, RMSNorm, SwiGLU, and modern sampling techniques for efficient training.

How It Works

This project implements a GPT model architecture using pure PyTorch, minimizing abstraction overhead. It incorporates modern LLM components such as Flash Attention (when available), RMSNorm, SwiGLU activations, and Rotary Positional Embeddings (RoPE) for improved performance and efficiency. Training supports mixed precision (bfloat16/float16), gradient accumulation, learning rate decay with warmup, and gradient clipping.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt
  • Prerequisites: Python 3.8+, PyTorch 2.0+ with CUDA, modern GPU recommended.
  • Quick Start: https://github.com/Om-Alve/smolGPT (See README for detailed training and inference commands)

Highlighted Details

  • Minimal PyTorch codebase for educational clarity.
  • Supports Flash Attention, RMSNorm, SwiGLU, and RoPE.
  • Includes built-in TinyStories dataset processing and SentencePiece tokenizer integration.
  • Offers pre-trained checkpoint on TinyStories dataset.

Maintenance & Community

  • The project is maintained by Om-Alve. Contributions are welcome via issues or pull requests.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

The README notes that this implementation is primarily for educational purposes and suggests scaling up model size and dataset for production use.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 30 days

Explore Similar Projects

Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Feedback? Help us improve.