Discover and explore top open-source AI tools and projects—updated daily.
raiyanyahyaBuild a modern LLM from scratch, line by line
New!
Top 23.4% on SourcePulse
Build a modern Large Language Model (LLM) from scratch with this interactive textbook. Aimed at Python developers and students, it provides a deep, line-by-line understanding of Transformer architectures, enabling users to construct and train their own GPT models, moving beyond superficial API usage to grasp core computational principles.
How It Works
This project offers a 12-chapter, ~3,600-line interactive guide where users write every component of a GPT model themselves. It employs a pedagogical approach combining five-year-old analogies, worked numerical examples, and meticulously annotated code. The core implementation focuses on a decoder-only Transformer architecture, integrating state-of-the-art techniques like Rotary Positional Embeddings (RoPE), RMSNorm, SwiGLU, AdamW optimizer, Byte Pair Encoding (BPE) tokenization, weight tying, and mixed-precision training. This method ensures a comprehensive grasp of internal mechanics, unlike shallow API-based tutorials or dense academic papers.
Quick Start & Requirements
git clone https://github.com/raiyanyahya/how-to-train-your-gpt.gitpython -m venv gpt_env && source gpt_env/bin/activate (or Windows equivalent).pip install torch tiktoken datasets numpy matplotlibpython -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"Prerequisites: Basic Python proficiency (variables, functions, classes). No prior ML, calculus, or linear algebra knowledge is assumed; these are taught contextually. GPU acceleration is highly recommended for training (~2 hours on RTX 3090); CPU-only execution is approximately 10-50x slower.
Highlighted Details
main.py script for end-to-end training and inference.Maintenance & Community
The repository welcomes issues and pull requests. Specific community channels (e.g., Discord, Slack), roadmap details, or notable contributor/sponsorship information are not detailed in the README.
Licensing & Compatibility
The open-source license for this repository is not explicitly stated in the provided README. Potential users should verify licensing terms before integration into commercial or closed-source projects.
Limitations & Caveats
This project serves primarily as an educational tool for understanding LLM internals rather than a production-ready framework. CPU-only training is significantly slower. The implemented architecture is based on publicly disclosed techniques; proprietary aspects of models like GPT-4 and Claude remain undisclosed.
1 day ago
Inactive