Transformer-from-scratch  by waylandzhang

LLM training demo with ~240 lines of code

created 1 year ago
420 stars

Top 71.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a minimal, ~240-line PyTorch implementation of a Transformer-based Large Language Model (LLM) trained from scratch. It's designed for beginners to understand LLM training fundamentals, demonstrating the process on a small textbook dataset with a ~51M parameter model.

How It Works

The implementation focuses on the decoder-only architecture, mirroring GPT models. It utilizes standard Transformer components like self-attention and feed-forward networks. The code is intentionally kept simple for educational purposes, allowing users to easily trace the data flow and training dynamics.

Quick Start & Requirements

  • Install dependencies: pip install numpy requests torch tiktoken matplotlib pandas
  • Run training: python model.py
  • The first run downloads the dataset. Training takes ~20 minutes on a single i7 CPU.
  • A Jupyter Notebook (step-by-step.ipynb) is available for detailed architectural understanding.

Highlighted Details

  • Trains a ~51M parameter model on a 450Kb dataset.
  • Achieves training loss around 2.807 after 5000 iterations.
  • Includes a step-by-step Jupyter Notebook visualizing intermediate Transformer layers and attention mechanisms.
  • Offers sample code for fine-tuning and inference with pre-trained GPT-2 models in the /GPT2 directory.

Maintenance & Community

The project is a personal demonstration by waylandzhang, inspired by nanoGPT. No specific community channels or active maintenance beyond the initial release are indicated.

Licensing & Compatibility

The repository does not explicitly state a license. This implies all rights are reserved, and usage, modification, or distribution may be restricted. Commercial use or linking with closed-source projects is not advised without explicit permission.

Limitations & Caveats

The project is a simplified demo and not intended for production use. It uses a very small dataset, and the model's capabilities are limited. The lack of an explicit license poses significant adoption risks.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
54 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

mlx-gpt2 by pranavjad

0.5%
393
Minimal GPT-2 implementation for educational purposes
created 1 year ago
updated 1 year ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
258
Efficiently train foundation models with PyTorch
created 1 year ago
updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
1 more.

recurrent-pretraining by seal-rg

0.1%
806
Pretraining code for depth-recurrent language model research
created 5 months ago
updated 2 weeks ago
Feedback? Help us improve.