autoresearch by karpathy

Autonomous LLM research agent for single-GPU training

Created 3 months ago

85,848 stars

Top 0.2% on SourcePulse

View on GitHub

28 Experts Love This Project

Editor of Latent Space

Rodrigo Nader

Cofounder of Langflow

and 24 more!

Project Summary

Summary

This repository facilitates autonomous AI-driven research for Large Language Model (LLM) pretraining. It is designed for researchers and engineers aiming to automate and accelerate LLM experimentation. By providing an AI agent with a simplified LLM training setup, the system allows for overnight, iterative experiments where the agent modifies model code, trains, evaluates, and refines configurations, potentially discovering improved models without constant human oversight.

How It Works

The core innovation lies in an AI agent's ability to iteratively modify a single, self-contained Python file (train.py). This file houses the complete GPT model architecture, optimizer (Muon + AdamW), and training loop. The agent's task is to explore variations in architecture, hyperparameters, optimizer settings, and batch sizes. Each training run is strictly time-boxed to a 5-minute wall-clock duration, excluding startup and compilation. Performance is gauged by validation bits-per-byte (val_bpb), a metric invariant to vocabulary size, enabling fair comparisons across diverse agent-driven modifications. This focused approach simplifies experiment tracking and review, making the process highly manageable.

Quick Start & Requirements

Install Dependencies: uv sync
Prepare Data & Tokenizer: uv run prepare.py (one-time, ~5 min)
Run Single Training Experiment: uv run train.py (5 min + startup)
Prerequisites: A single NVIDIA GPU (tested on H100), Python 3.10+, uv package manager.
Links: Tweet context

Highlighted Details

Autonomous Research: Leverages AI agents to drive LLM pretraining experiments automatically.
Single-File Modifiability: The agent exclusively modifies train.py, ensuring manageable scope and reviewable diffs.
Fixed Time Budget: All training runs are capped at 5 minutes, standardizing experiments for direct comparison.
Standardized Metric: Utilizes validation bits-per-byte (val_bpb) for objective, vocabulary-independent performance evaluation.
Self-Contained: Minimal external dependencies beyond PyTorch, facilitating easy setup and understanding.

Maintenance & Community

No specific details regarding notable contributors, sponsorships, partnerships, or community channels (e.g., Discord, Slack) were present in the provided README text.

Licensing & Compatibility

The project is released under the MIT License. No explicit restrictions for commercial use or linking with closed-source projects were mentioned.

Limitations & Caveats

This implementation is presented as a simplified, single-GPU baseline for LLM training research. It does not support distributed training or more complex experimental setups out-of-the-box. The sophistication of the research program is directly tied to the instructions provided within program.md.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5,233 stars in the last 30 days