Transformer-from-scratch by waylandzhang

LLM training demo with ~240 lines of code

Created 1 year ago

496 stars

Top 62.6% on SourcePulse

Project Summary

This repository provides a minimal, ~240-line PyTorch implementation of a Transformer-based Large Language Model (LLM) trained from scratch. It's designed for beginners to understand LLM training fundamentals, demonstrating the process on a small textbook dataset with a ~51M parameter model.

How It Works

The implementation focuses on the decoder-only architecture, mirroring GPT models. It utilizes standard Transformer components like self-attention and feed-forward networks. The code is intentionally kept simple for educational purposes, allowing users to easily trace the data flow and training dynamics.

Quick Start & Requirements

Install dependencies: pip install numpy requests torch tiktoken matplotlib pandas
Run training: python model.py
The first run downloads the dataset. Training takes ~20 minutes on a single i7 CPU.
A Jupyter Notebook (step-by-step.ipynb) is available for detailed architectural understanding.

Highlighted Details

Trains a ~51M parameter model on a 450Kb dataset.
Achieves training loss around 2.807 after 5000 iterations.
Includes a step-by-step Jupyter Notebook visualizing intermediate Transformer layers and attention mechanisms.
Offers sample code for fine-tuning and inference with pre-trained GPT-2 models in the /GPT2 directory.

Maintenance & Community

The project is a personal demonstration by waylandzhang, inspired by nanoGPT. No specific community channels or active maintenance beyond the initial release are indicated.

Licensing & Compatibility

The repository does not explicitly state a license. This implies all rights are reserved, and usage, modification, or distribution may be restricted. Commercial use or linking with closed-source projects is not advised without explicit permission.

Limitations & Caveats

The project is a simplified demo and not intended for production use. It uses a very small dataset, and the model's capabilities are limited. The lack of an explicit license poses significant adoption risks.

Transformer-from-scratch by waylandzhang

Explore Similar Projects

llms-from-scratch-rs by nerdai

gigaGPT by Cerebras

cookbook by EleutherAI

Seed-Coder by ByteDance-Seed

nano-aha-moment by McGill-NLP

train-llm-from-scratch by FareedKhan-dev

tiny-llm-zh by wdndev

bert4torch by Tongjilibo

LLM-workshop-2024 by rasbt

llms-from-scratch-cn by datawhalechina

minimind by jingyaogong

nanoGPT by karpathy