smolGPT  by Om-Alve

Minimal PyTorch LLM for educational training

created 6 months ago
1,417 stars

Top 29.3% on sourcepulse

GitHubView on GitHub
Project Summary

SMOL-GPT provides a minimal, educational PyTorch implementation for training small Large Language Models (LLMs) from scratch. It targets researchers and developers interested in understanding LLM internals, offering features like Flash Attention, RMSNorm, SwiGLU, and modern sampling techniques for efficient training.

How It Works

This project implements a GPT model architecture using pure PyTorch, minimizing abstraction overhead. It incorporates modern LLM components such as Flash Attention (when available), RMSNorm, SwiGLU activations, and Rotary Positional Embeddings (RoPE) for improved performance and efficiency. Training supports mixed precision (bfloat16/float16), gradient accumulation, learning rate decay with warmup, and gradient clipping.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt
  • Prerequisites: Python 3.8+, PyTorch 2.0+ with CUDA, modern GPU recommended.
  • Quick Start: https://github.com/Om-Alve/smolGPT (See README for detailed training and inference commands)

Highlighted Details

  • Minimal PyTorch codebase for educational clarity.
  • Supports Flash Attention, RMSNorm, SwiGLU, and RoPE.
  • Includes built-in TinyStories dataset processing and SentencePiece tokenizer integration.
  • Offers pre-trained checkpoint on TinyStories dataset.

Maintenance & Community

  • The project is maintained by Om-Alve. Contributions are welcome via issues or pull requests.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

The README notes that this implementation is primarily for educational purposes and suggests scaling up model size and dataset for production use.

Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
78 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
258
Efficiently train foundation models with PyTorch
created 1 year ago
updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.