nanochat  by karpathy

A minimal, full-stack LLM implementation for accessible AI development

Created 3 days ago

New!

20,390 stars

Top 2.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

karpathy/nanochat provides a minimal, full-stack codebase for building and deploying ChatGPT-like Large Language Models (LLMs) on a budget. Targeting engineers and researchers, it enables end-to-end LLM development—from tokenization to web serving—within a single, hackable repository, democratizing LLM experimentation with low cost and cognitive complexity.

How It Works

The project implements the entire LLM pipeline in a clean, dependency-lite architecture. It focuses on a unified approach, covering tokenization, pretraining, finetuning, evaluation, inference, and a simple web UI for interaction. This design prioritizes hackability and understanding, allowing users to run and modify the complete LLM lifecycle on accessible hardware.

Quick Start & Requirements

  • Install/Run: Execute bash speedrun.sh for the $100 tier model training and inference. Serve the web UI with python -m scripts.chat_web.
  • Prerequisites: Recommended: 8x NVIDIA H100 GPUs. Requires Python and a uv virtual environment.
  • Resource Footprint: speedrun.sh completes in approximately 4 hours on an 8xH100 node ($24/hr).
  • Docs: Walkthrough available in the repository's Discussions.

Highlighted Details

  • Budget LLM: Aims to deliver a ChatGPT-like experience for under $100.
  • End-to-End Pipeline: Integrates all stages of LLM development in one codebase.
  • Codebase Analysis: The entire project can be packaged into a prompt for LLM querying.
  • Performance Reporting: Includes detailed evaluation metrics (report.md) for different training stages.

Maintenance & Community

Authored by Andrej Karpathy, with advice from Alec Radford. Leverages data from HuggingFace and compute resources from Lambda. Community support and detailed walkthroughs are available in the repository's Discussions.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: The MIT license is permissive, allowing for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The $100 tier model exhibits limited capabilities, akin to a "kindergartener," reflected in lower benchmark scores. Higher-tier models are not fully supported in the master branch. Running on hardware with less than 80GB VRAM requires significant hyperparameter tuning (e.g., device_batch_size), and single-GPU execution is substantially slower. The project is described as a "strong baseline" and is "nowhere finished."

Health Check
Last Commit

16 hours ago

Responsiveness

Inactive

Pull Requests (30d)
51
Issues (30d)
22
Star History
20,635 stars in the last 3 days

Explore Similar Projects

Starred by Elvis Saravia Elvis Saravia(Founder of DAIR.AI), Tom Moor Tom Moor(Head of Engineering at Linear; Founder of Outline), and
6 more.

promptable by cfortuner

0%
2k
TS/JS library for building full-stack AI apps
Created 2 years ago
Updated 2 years ago
Feedback? Help us improve.