Discover and explore top open-source AI tools and projects—updated daily.
A minimal, full-stack LLM implementation for accessible AI development
New!
Top 2.1% on SourcePulse
Summary
karpathy/nanochat
provides a minimal, full-stack codebase for building and deploying ChatGPT-like Large Language Models (LLMs) on a budget. Targeting engineers and researchers, it enables end-to-end LLM development—from tokenization to web serving—within a single, hackable repository, democratizing LLM experimentation with low cost and cognitive complexity.
How It Works
The project implements the entire LLM pipeline in a clean, dependency-lite architecture. It focuses on a unified approach, covering tokenization, pretraining, finetuning, evaluation, inference, and a simple web UI for interaction. This design prioritizes hackability and understanding, allowing users to run and modify the complete LLM lifecycle on accessible hardware.
Quick Start & Requirements
bash speedrun.sh
for the $100 tier model training and inference. Serve the web UI with python -m scripts.chat_web
.uv
virtual environment.speedrun.sh
completes in approximately 4 hours on an 8xH100 node ($24/hr).Highlighted Details
report.md
) for different training stages.Maintenance & Community
Authored by Andrej Karpathy, with advice from Alec Radford. Leverages data from HuggingFace and compute resources from Lambda. Community support and detailed walkthroughs are available in the repository's Discussions.
Licensing & Compatibility
Limitations & Caveats
The $100 tier model exhibits limited capabilities, akin to a "kindergartener," reflected in lower benchmark scores. Higher-tier models are not fully supported in the master branch. Running on hardware with less than 80GB VRAM requires significant hyperparameter tuning (e.g., device_batch_size
), and single-GPU execution is substantially slower. The project is described as a "strong baseline" and is "nowhere finished."
16 hours ago
Inactive