Transformer-from-scratch  by waylandzhang

LLM training demo with ~240 lines of code

Created 1 year ago
442 stars

Top 67.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a minimal, ~240-line PyTorch implementation of a Transformer-based Large Language Model (LLM) trained from scratch. It's designed for beginners to understand LLM training fundamentals, demonstrating the process on a small textbook dataset with a ~51M parameter model.

How It Works

The implementation focuses on the decoder-only architecture, mirroring GPT models. It utilizes standard Transformer components like self-attention and feed-forward networks. The code is intentionally kept simple for educational purposes, allowing users to easily trace the data flow and training dynamics.

Quick Start & Requirements

  • Install dependencies: pip install numpy requests torch tiktoken matplotlib pandas
  • Run training: python model.py
  • The first run downloads the dataset. Training takes ~20 minutes on a single i7 CPU.
  • A Jupyter Notebook (step-by-step.ipynb) is available for detailed architectural understanding.

Highlighted Details

  • Trains a ~51M parameter model on a 450Kb dataset.
  • Achieves training loss around 2.807 after 5000 iterations.
  • Includes a step-by-step Jupyter Notebook visualizing intermediate Transformer layers and attention mechanisms.
  • Offers sample code for fine-tuning and inference with pre-trained GPT-2 models in the /GPT2 directory.

Maintenance & Community

The project is a personal demonstration by waylandzhang, inspired by nanoGPT. No specific community channels or active maintenance beyond the initial release are indicated.

Licensing & Compatibility

The repository does not explicitly state a license. This implies all rights are reserved, and usage, modification, or distribution may be restricted. Commercial use or linking with closed-source projects is not advised without explicit permission.

Limitations & Caveats

The project is a simplified demo and not intended for production use. It uses a very small dataset, and the model's capabilities are limited. The lack of an explicit license poses significant adoption risks.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.