Discover and explore top open-source AI tools and projects—updated daily.
YYZhang2025Building Large Language Models from scratch
Top 99.8% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This repository offers comprehensive solutions and detailed notes for Stanford's CS336 "LLM from Scratch" course, covering foundational and advanced topics. It's designed for engineers and researchers aiming to build and understand Large Language Models by implementing core components, making it a valuable resource for hands-on learning and experimentation.
How It Works
The project systematically implements key LLM building blocks. It begins with Byte Pair Encoding (BPE) for tokenization and progresses to a configurable Transformer language model featuring RMS Norm and Rotary Positional Embeddings (RoPE). Advanced sections explore Mixture of Experts (MoE) layers for enhanced model capacity, Triton-based Flash Attention for computational efficiency, and data parallelism for distributed training. The final assignments focus on LLM alignment techniques, including Supervised Fine-Tuning (SFT), Expert Iteration (EI), and Group Relative Policy Optimization (GRPO), applied to reasoning tasks.
Quick Start & Requirements
uv for environment management (pip install uv or brew install uv). Code execution via uv run <python_file_path>. Dependencies are managed by uv sync.uv, wget, huggingface_hub. Training performance benchmarks are based on a single NVIDIA H100 GPU. Flash Attention implementation leverages Triton. Alignment tasks use GSM8k and Math-12k datasets.Highlighted Details
d_ff, outperformed a dense model of similar computational cost on the TinyStories dataset.Maintenance & Community
The provided README does not contain information regarding maintenance status, community channels (e.g., Discord, Slack), or a public roadmap.
Licensing & Compatibility
The README content does not specify the project's license or any compatibility notes for commercial use.
Limitations & Caveats
d_ff to the dense model showed no significant improvement, possibly due to overfitting on the small TinyStories dataset.3 months ago
Inactive
mlfoundations
facebookresearch