codex-autoresearch by leo-lilinxiao

Autonomous goal-driven experimentation for software engineering

Created 3 months ago

1,969 stars

Top 21.6% on SourcePulse

Project Summary

Codex Autoresearch Skill provides an autonomous, goal-driven system for iterative software improvement, inspired by Karpathy's autoresearch concept. It enables engineers to automate complex tasks like code refactoring, bug fixing, performance optimization, and security auditing by continuously cycling through modify, verify, and retain/discard steps. The system targets developers seeking to enhance code quality and efficiency through unattended, long-running experimentation, offering significant time savings and potential for novel solutions.

How It Works

The core mechanism is a self-directed, modify-verify-decide loop. Users provide a natural language goal, which Codex Autoresearch translates into a plan. It then makes a single atomic code change, commits it, and verifies its impact against a defined mechanical metric (e.g., test coverage, error count, latency). Successful changes are kept, failures are reverted, and lessons are learned for future iterations. This process repeats autonomously, adapting through strategies like REFINE, PIVOT, and web search when encountering difficulties, ensuring continuous progress towards the stated objective.

Quick Start & Requirements

Install: Clone the repository and copy the skill into your Codex project's .agents/skills/ directory, or use the Codex skill installer: $skill-installer install https://github.com/leo-lilinxiao/codex-autoresearch.
Run: Initiate Codex in your project and provide a natural language goal, e.g., $codex-autoresearch I want to get rid of all the \any` types in my TypeScript code`.
Prerequisites: Requires a functional Codex environment. The system automatically probes for CPU/GPU/RAM and necessary toolchains.
Links: INSTALL.md, GUIDE.md, EXAMPLES.md are available within the repository's docs/ directory.

Highlighted Details

Autonomous Modes: Automatically selects and chains specialized modes (loop, plan, debug, fix, security, ship, exec) based on user goals.
Cross-Run Learning: Persists learned strategies and failures in autoresearch-lessons.md to bias future hypothesis generation.
Smart Stuck Recovery: Employs REFINE, PIVOT, and web search protocols to overcome obstacles without user intervention.
Parallel Experiments: Supports testing multiple hypotheses concurrently using isolated git worktrees for faster exploration.
Session Resume: Restores interrupted runs from the last consistent state using snapshots (autoresearch-state.json).
CI/CD Integration: Offers a non-interactive exec mode for automation pipelines with JSON output and defined exit codes.
Dual-Gate Verification: Utilizes separate "Verify" (progress) and "Guard" (regression prevention) commands for robust change validation.

Maintenance & Community

The repository includes a CONTRIBUTING.md file. No specific details regarding active maintainers, sponsorships, community channels (like Discord/Slack), or a public roadmap are provided in the README.

Licensing & Compatibility

The project is licensed under the MIT license. This permissive license allows for commercial use, modification, and distribution, including integration within closed-source projects.

Limitations & Caveats

A primary adoption blocker is the dependency on the Codex platform. The system's autonomous nature means outcomes can sometimes be unpredictable, especially with ambiguous goals. While it handles errors and dirty worktrees robustly, extensive overnight or parallel runs may require significant computational resources. The effectiveness of autonomous decision-making relies heavily on the quality and measurability of the defined goals and verification metrics.

codex-autoresearch by leo-lilinxiao

Explore Similar Projects

evanflow by evanklem

ralph-wiggum by fstandhartinger

claude-elixir-phoenix by oliver-kriska

supergoal by robzilla1738

compound-product by snarktank

MiniMax-M2.7 by MiniMax-AI

continuous-claude by AnandChowdhary

agentsys by agent-sh

most-capable-agent-system-prompt by fainir

autoresearch by uditgoenka

ml-intern by huggingface

ralph by snarktank