Tutorial for building LLMs from scratch
Top 27.4% on sourcepulse
This repository provides a comprehensive, hands-on guide to mastering Large Language Models (LLMs) from scratch. It targets engineers and researchers aiming to understand and implement LLM training, fine-tuning, and deployment, offering a structured learning path with accompanying video tutorials.
How It Works
The project emphasizes a "from scratch" implementation approach, mirroring Andrej Karpathy's educational style. It covers foundational LLM concepts, dense models, Mixture-of-Experts (MoE) architectures, and various fine-tuning techniques like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF). The code is designed to be educational, with explanations integrated into the development process.
Quick Start & Requirements
src/
directory and organized by chapter. Notebooks are available for direct execution.Highlighted Details
Maintenance & Community
The project is actively developed by bbruceyuan, with community engagement encouraged via WeChat, a personal blog, and a public WeChat account.
Licensing & Compatibility
The repository's licensing is not explicitly stated in the provided README.
Limitations & Caveats
Some sections, such as the nanoGPT implementation and activation function optimization, are marked as "todo," indicating incomplete content. The project is presented as an ongoing learning resource.
3 months ago
1+ week