autoresearch-claude-code by drivelineresearch

Agentic loop for autonomous code and ML model optimization

Created 4 months ago

329 stars

Top 82.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

John Resig

Author of jQuery; Chief Software Architect at Khan Academy

Project Summary

Autonomous experiment loop skill for Claude Code, automating iterative optimization of code, ML models, and build systems. It targets developers and researchers seeking to enhance performance metrics through a self-driven, data-driven process, offering significant improvements with minimal manual intervention.

How It Works

This project implements an autonomous experiment loop as a pure skill for Claude Code, eliminating the need for a separate MCP server. Given a goal, benchmark, and files to modify, the agent autonomously creates a branch, sets up a session document and benchmark script, runs a baseline, and then enters an infinite loop. It writes experiment configurations to autoresearch.jsonl, executes experiments via ./autoresearch.sh, logs results, and commits successful iterations. User prompts can steer the ongoing experiments. This approach enables continuous, data-driven refinement by automatically exploring and retaining winning ideas.

Quick Start & Requirements

Installation can be achieved via Claude's built-in capabilities (Option A), by specifying the plugin directory (Option B), or through manual symlinks using install.sh (Option C). The primary requirement is Claude Code. Dependency management is handled by uv. Optional GPU/CUDA support is auto-detected for specific models (XGBoost, CatBoost, LightGBM, PyTorch). The example setup involves cloning the openbiomechanics dataset and installing core dependencies with uv sync, with additional groups like --extra all for comprehensive model backend support. Links: Repo, uv Docs.

Highlighted Details

Autonomous Optimization Loop: Continuously iterates through ideas, measures outcomes against a defined benchmark, and retains improvements.
Versatile Application: Capable of optimizing ML models (e.g., R², RMSE), code performance (runtime, memory), build systems (bundle size), frontend metrics (Lighthouse), and prompt engineering evaluations.
Demonstrated Efficacy: The Fastball Velocity Prediction example achieved a +78% R² increase (from 0.44 to 0.78) and a -38% RMSE reduction (from 3.53 mph to 2.20 mph) over 22 autonomous experiments.
Extensive Model Zoo: Integrates 19 models across Boosting, Neural, Linear, Bayesian, and Ensemble categories, featuring lazy imports and automatic GPU acceleration where available.

Licensing & Compatibility

The project is licensed under the MIT license, which is permissive and generally suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool is primarily designed as a plugin for Claude Code, potentially limiting its standalone utility. While installation options are provided, setting up the full example with all model dependencies requires careful management using uv. The project may be considered experimental, and its effectiveness is contingent on the ability to define clear, measurable metrics for the optimization target.

autoresearch-claude-code by drivelineresearch

Explore Similar Projects

awesome-LLM-driven-kernel-generation by flagos-ai

AKO4ALL by TongmingLAIC

AutoSOTA by tsinghua-fib-lab

awesome-autoresearch by yibie

MiniMax-M2.7 by MiniMax-AI

alpaca_farm by tatsu-lab

HGM by metauto-ai

awesome-autoresearch by webfuse-com

awesome-machine-learning-in-compilers by zwang4

autoresearch by uditgoenka

dgm by jennyzzt

original_performance_takehome by anthropics