Discover and explore top open-source AI tools and projects—updated daily.
karpathyAutonomous LLM research agent for single-GPU training
New!
Top 10.5% on SourcePulse
Summary
This repository facilitates autonomous AI-driven research for Large Language Model (LLM) pretraining. It is designed for researchers and engineers aiming to automate and accelerate LLM experimentation. By providing an AI agent with a simplified LLM training setup, the system allows for overnight, iterative experiments where the agent modifies model code, trains, evaluates, and refines configurations, potentially discovering improved models without constant human oversight.
How It Works
The core innovation lies in an AI agent's ability to iteratively modify a single, self-contained Python file (train.py). This file houses the complete GPT model architecture, optimizer (Muon + AdamW), and training loop. The agent's task is to explore variations in architecture, hyperparameters, optimizer settings, and batch sizes. Each training run is strictly time-boxed to a 5-minute wall-clock duration, excluding startup and compilation. Performance is gauged by validation bits-per-byte (val_bpb), a metric invariant to vocabulary size, enabling fair comparisons across diverse agent-driven modifications. This focused approach simplifies experiment tracking and review, making the process highly manageable.
Quick Start & Requirements
uv syncuv run prepare.py (one-time, ~5 min)uv run train.py (5 min + startup)Highlighted Details
train.py, ensuring manageable scope and reviewable diffs.Maintenance & Community
No specific details regarding notable contributors, sponsorships, partnerships, or community channels (e.g., Discord, Slack) were present in the provided README text.
Licensing & Compatibility
The project is released under the MIT License. No explicit restrictions for commercial use or linking with closed-source projects were mentioned.
Limitations & Caveats
This implementation is presented as a simplified, single-GPU baseline for LLM training research. It does not support distributed training or more complex experimental setups out-of-the-box. The sophistication of the research program is directly tied to the instructions provided within program.md.
6 hours ago
Inactive