Discover and explore top open-source AI tools and projects—updated daily.
neosigmaaiAutomated agent optimization framework
New!
Top 72.2% on SourcePulse
Summary
This repository provides a framework for creating self-improving AI coding agents. It automates the process of agent refinement by enabling agents to learn from benchmark failures, iteratively enhance their system prompts and tools, and validate changes against a self-maintained evaluation suite. This approach demonstrably boosts agent performance, as shown by a ~40% score improvement on the Tau3 benchmark, making it valuable for researchers and developers seeking to enhance agent capabilities without constant manual intervention.
How It Works
The system operates on a continuous loop: run a benchmark, analyze failures, improve the agent's code (agent/agent.py), gate the changes, record results, and update learnings. The core innovation lies in the agent's ability to autonomously identify failure patterns, update its own logic, and maintain a regression test suite (workspace/suite.json). Changes are rigorously gated by passing both the self-maintained eval suite and achieving a higher score on the full test set compared to previous bests. Learnings are logged persistently in workspace/learnings.md to preserve context across iterations.
Quick Start & Requirements
docker compose build, docker compose run autoeval python prepare.py (initialization), and docker compose run autoeval python benchmark.py (running benchmarks).OPENAI_API_KEY, and a compatible coding agent (e.g., Claude Code, Codex CLI). The TAU2_DATA_DIR environment variable must be set.Highlighted Details
workspace/suite.json) that the agent updates dynamically.workspace/learnings.md) of agent actions, successes, and requests for human intervention.Maintenance & Community
No specific details regarding maintainers, sponsorships, or community channels (e.g., Discord, Slack) are provided in the README.
Licensing & Compatibility
The license type and any compatibility notes for commercial use or closed-source linking are not specified in the provided README content.
Limitations & Caveats
The agent's modifications are currently restricted to the agent/agent.py file. The system relies on external coding agents and the OpenAI API, introducing external dependencies. While the framework is benchmark-agnostic, the provided example uses tau-bench.
4 days ago
Inactive
microsoft