nanochat by karpathy

A minimal, full-stack LLM implementation for accessible AI development

Created 1 month ago

37,765 stars

Top 0.8% on SourcePulse

View on GitHub

27 Experts Love This Project

George Hotz

Author of tinygrad; Founder of the tiny corp, comma.ai

Founder of Astral; Author of Ruff, uv

and 23 more!

Project Summary

Summary

karpathy/nanochat provides a minimal, full-stack codebase for building and deploying ChatGPT-like Large Language Models (LLMs) on a budget. Targeting engineers and researchers, it enables end-to-end LLM development—from tokenization to web serving—within a single, hackable repository, democratizing LLM experimentation with low cost and cognitive complexity.

How It Works

The project implements the entire LLM pipeline in a clean, dependency-lite architecture. It focuses on a unified approach, covering tokenization, pretraining, finetuning, evaluation, inference, and a simple web UI for interaction. This design prioritizes hackability and understanding, allowing users to run and modify the complete LLM lifecycle on accessible hardware.

Quick Start & Requirements

Install/Run: Execute bash speedrun.sh for the $100 tier model training and inference. Serve the web UI with python -m scripts.chat_web.
Prerequisites: Recommended: 8x NVIDIA H100 GPUs. Requires Python and a uv virtual environment.
Resource Footprint: speedrun.sh completes in approximately 4 hours on an 8xH100 node ($24/hr).
Docs: Walkthrough available in the repository's Discussions.

Highlighted Details

Budget LLM: Aims to deliver a ChatGPT-like experience for under $100.
End-to-End Pipeline: Integrates all stages of LLM development in one codebase.
Codebase Analysis: The entire project can be packaged into a prompt for LLM querying.
Performance Reporting: Includes detailed evaluation metrics (report.md) for different training stages.

Maintenance & Community

Authored by Andrej Karpathy, with advice from Alec Radford. Leverages data from HuggingFace and compute resources from Lambda. Community support and detailed walkthroughs are available in the repository's Discussions.

Licensing & Compatibility

License: MIT.
Compatibility: The MIT license is permissive, allowing for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The $100 tier model exhibits limited capabilities, akin to a "kindergartener," reflected in lower benchmark scores. Higher-tier models are not fully supported in the master branch. Running on hardware with less than 80GB VRAM requires significant hyperparameter tuning (e.g., device_batch_size), and single-GPU execution is substantially slower. The project is described as a "strong baseline" and is "nowhere finished."

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3,200 stars in the last 30 days