smolGPT by Om-Alve

Minimal PyTorch LLM for educational training

Created 1 year ago

1,457 stars

Top 27.9% on SourcePulse

4 Experts Love This Project

hugs

Creator of Selenium

omarsar

Founder of DAIR.AI

antirez

Salvatore Sanfilippo

Author of Redis

jrk

Jonathan Ragan-Kelley

Professor at MIT

Project Summary

SMOL-GPT provides a minimal, educational PyTorch implementation for training small Large Language Models (LLMs) from scratch. It targets researchers and developers interested in understanding LLM internals, offering features like Flash Attention, RMSNorm, SwiGLU, and modern sampling techniques for efficient training.

How It Works

This project implements a GPT model architecture using pure PyTorch, minimizing abstraction overhead. It incorporates modern LLM components such as Flash Attention (when available), RMSNorm, SwiGLU activations, and Rotary Positional Embeddings (RoPE) for improved performance and efficiency. Training supports mixed precision (bfloat16/float16), gradient accumulation, learning rate decay with warmup, and gradient clipping.

Quick Start & Requirements

Installation: pip install -r requirements.txt
Prerequisites: Python 3.8+, PyTorch 2.0+ with CUDA, modern GPU recommended.
Quick Start: https://github.com/Om-Alve/smolGPT (See README for detailed training and inference commands)

Highlighted Details

Minimal PyTorch codebase for educational clarity.
Supports Flash Attention, RMSNorm, SwiGLU, and RoPE.
Includes built-in TinyStories dataset processing and SentencePiece tokenizer integration.
Offers pre-trained checkpoint on TinyStories dataset.

Maintenance & Community

The project is maintained by Om-Alve. Contributions are welcome via issues or pull requests.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README.

Limitations & Caveats

The README notes that this implementation is primarily for educational purposes and suggests scaling up model size and dataset for production use.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

7 stars in the last 30 days

Explore Similar Projects

Starred by

Maxime Labonne

Maxime Labonne(Head of Post-Training at Liquid AI) and

Wing Lian

Wing Lian(Founder of Axolotl AI).

rho by microsoft

LLM pretraining research paper using selective language modeling (SLM)

Created 1 year ago

Updated 1 year ago

Starred by

Sebastian Raschka

Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)").

mint by dpressel

Minimal PyTorch library for Transformer tutorials

Created 3 years ago

Updated 3 years ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

native-sparse-attention-triton by XunhaoLai

Efficient sparse attention for LLMs

Created 10 months ago

Updated 7 months ago

awesomeMLSys by gpu-mode

ML systems onboarding reading list

Created 1 year ago

Updated 11 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind),

Yiran Wu

Yiran Wu(Coauthor of AutoGen), and

1 more.

AnglE by SeanLee97

Sentence embedding framework for training/inference using BERT/LLM backbones

Created 2 years ago

Updated 2 months ago

segformer-pytorch by bubbliiiing

PyTorch code for SegFormer semantic segmentation

Created 3 years ago

Updated 2 years ago

Starred by

Simon Willison

Simon Willison(Coauthor of Django),

Johannes Hagemann

Johannes Hagemann(Cofounder of Prime Intellect), and

4 more.

OpenMoE by XueFuzhao

Open-source MoE LLM for research

Created 2 years ago

Updated 1 year ago

mini_qwen by qiufengqijun

LLM project for training a large language model from scratch

Created 11 months ago

Updated 10 months ago

ColossalAI-Examples by hpcaitech

Examples for training models with hybrid parallelism using ColossalAI

Created 4 years ago

Updated 2 years ago

tiny-llm-zh by wdndev

Chinese LLM for learning large language models

Created 1 year ago

Updated 1 year ago

Starred by

Théophile Gervet

Théophile Gervet(Cofounder of Genesis AI),

Jason Knight

Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and

7 more.

lingua by facebookresearch

LLM research codebase for training and inference

Created 1 year ago

Updated 5 months ago

PyTorch-Tutorial-2nd by TingsongYu

PyTorch tutorial (2nd edition) for deep learning engineers

Created 4 years ago

Updated 11 months ago

Feedback? Help us improve.