autoresearch-claude-code  by drivelineresearch

Agentic loop for autonomous code and ML model optimization

Created 1 month ago
283 stars

Top 92.2% on SourcePulse

GitHubView on GitHub
Project Summary

Autonomous experiment loop skill for Claude Code, automating iterative optimization of code, ML models, and build systems. It targets developers and researchers seeking to enhance performance metrics through a self-driven, data-driven process, offering significant improvements with minimal manual intervention.

How It Works

This project implements an autonomous experiment loop as a pure skill for Claude Code, eliminating the need for a separate MCP server. Given a goal, benchmark, and files to modify, the agent autonomously creates a branch, sets up a session document and benchmark script, runs a baseline, and then enters an infinite loop. It writes experiment configurations to autoresearch.jsonl, executes experiments via ./autoresearch.sh, logs results, and commits successful iterations. User prompts can steer the ongoing experiments. This approach enables continuous, data-driven refinement by automatically exploring and retaining winning ideas.

Quick Start & Requirements

Installation can be achieved via Claude's built-in capabilities (Option A), by specifying the plugin directory (Option B), or through manual symlinks using install.sh (Option C). The primary requirement is Claude Code. Dependency management is handled by uv. Optional GPU/CUDA support is auto-detected for specific models (XGBoost, CatBoost, LightGBM, PyTorch). The example setup involves cloning the openbiomechanics dataset and installing core dependencies with uv sync, with additional groups like --extra all for comprehensive model backend support. Links: Repo, uv Docs.

Highlighted Details

  • Autonomous Optimization Loop: Continuously iterates through ideas, measures outcomes against a defined benchmark, and retains improvements.
  • Versatile Application: Capable of optimizing ML models (e.g., R², RMSE), code performance (runtime, memory), build systems (bundle size), frontend metrics (Lighthouse), and prompt engineering evaluations.
  • Demonstrated Efficacy: The Fastball Velocity Prediction example achieved a +78% R² increase (from 0.44 to 0.78) and a -38% RMSE reduction (from 3.53 mph to 2.20 mph) over 22 autonomous experiments.
  • Extensive Model Zoo: Integrates 19 models across Boosting, Neural, Linear, Bayesian, and Ensemble categories, featuring lazy imports and automatic GPU acceleration where available.

Licensing & Compatibility

The project is licensed under the MIT license, which is permissive and generally suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool is primarily designed as a plugin for Claude Code, potentially limiting its standalone utility. While installation options are provided, setting up the full example with all model dependencies requires careful management using uv. The project may be considered experimental, and its effectiveness is contingent on the ability to define clear, measurable metrics for the optimization target.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
62 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
4 more.

alpaca_farm by tatsu-lab

0.1%
843
RLHF simulation framework for accessible instruction-following/alignment research
Created 3 years ago
Updated 1 year ago
Starred by Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft) and Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe).

autoresearch by uditgoenka

4.7%
4k
Autonomous iteration engine for Claude Code
Created 1 month ago
Updated 1 week ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
3 more.

dgm by jennyzzt

0.3%
2k
Self-improving agent system
Created 11 months ago
Updated 8 months ago
Feedback? Help us improve.