autoresearch  by karpathy

Autonomous LLM research agent for single-GPU training

Created 2 days ago

New!

4,675 stars

Top 10.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This repository facilitates autonomous AI-driven research for Large Language Model (LLM) pretraining. It is designed for researchers and engineers aiming to automate and accelerate LLM experimentation. By providing an AI agent with a simplified LLM training setup, the system allows for overnight, iterative experiments where the agent modifies model code, trains, evaluates, and refines configurations, potentially discovering improved models without constant human oversight.

How It Works

The core innovation lies in an AI agent's ability to iteratively modify a single, self-contained Python file (train.py). This file houses the complete GPT model architecture, optimizer (Muon + AdamW), and training loop. The agent's task is to explore variations in architecture, hyperparameters, optimizer settings, and batch sizes. Each training run is strictly time-boxed to a 5-minute wall-clock duration, excluding startup and compilation. Performance is gauged by validation bits-per-byte (val_bpb), a metric invariant to vocabulary size, enabling fair comparisons across diverse agent-driven modifications. This focused approach simplifies experiment tracking and review, making the process highly manageable.

Quick Start & Requirements

  • Install Dependencies: uv sync
  • Prepare Data & Tokenizer: uv run prepare.py (one-time, ~5 min)
  • Run Single Training Experiment: uv run train.py (5 min + startup)
  • Prerequisites: A single NVIDIA GPU (tested on H100), Python 3.10+, uv package manager.
  • Links: Tweet context

Highlighted Details

  • Autonomous Research: Leverages AI agents to drive LLM pretraining experiments automatically.
  • Single-File Modifiability: The agent exclusively modifies train.py, ensuring manageable scope and reviewable diffs.
  • Fixed Time Budget: All training runs are capped at 5 minutes, standardizing experiments for direct comparison.
  • Standardized Metric: Utilizes validation bits-per-byte (val_bpb) for objective, vocabulary-independent performance evaluation.
  • Self-Contained: Minimal external dependencies beyond PyTorch, facilitating easy setup and understanding.

Maintenance & Community

No specific details regarding notable contributors, sponsorships, partnerships, or community channels (e.g., Discord, Slack) were present in the provided README text.

Licensing & Compatibility

The project is released under the MIT License. No explicit restrictions for commercial use or linking with closed-source projects were mentioned.

Limitations & Caveats

This implementation is presented as a simplified, single-GPU baseline for LLM training research. It does not support distributed training or more complex experimental setups out-of-the-box. The sophistication of the research program is directly tied to the instructions provided within program.md.

Health Check
Last Commit

6 hours ago

Responsiveness

Inactive

Pull Requests (30d)
37
Issues (30d)
15
Star History
8,659 stars in the last 2 days

Explore Similar Projects

Feedback? Help us improve.