Discover and explore top open-source AI tools and projects—updated daily.
BetterForAllEvolving AI agents via adversarial and self-improvement loops
Top 96.2% on SourcePulse
Self-Improving Agents -- A Progression offers a structured framework for developing increasingly sophisticated AI agents capable of self-improvement. Targeting researchers and power users, this project progresses through four distinct levels, culminating in adversarial arenas, to build more robust and adaptable AI solutions that can overcome the limitations of static benchmarks.
How It Works
The project introduces four levels of self-improving code agents. Level 1, AutoResearch, uses a basic loop where an LLM iteratively refines code against a benchmark, leveraging verifiable rewards for autonomous operation. Level 2, Feedback Loop, enhances this by incorporating a reviewer agent that provides structured explanations for code failures, optimizing cost-performance through asymmetric information. Level 3, HyperAgent Loop, allows agents to rewrite their own source code, featuring a multi-stage validation process for safety and stability. Level 4, Arena Loop, employs adversarial co-evolution, pitting code agents against evolving test agents in a dynamic "arms race" to cultivate robustness against unforeseen edge cases, with Level 4b introducing tournament-style competition for strategy evolution.
Quick Start & Requirements
Installation requires pip install -r requirements.txt. A GEMINI_API_KEY obtained from Google AI Studio is a mandatory prerequisite. Individual levels can be executed using their respective run.py scripts (e.g., python autoresearch/run.py). Tasks such as 'snake', 'support', or 'email_validation' are selectable via the --task argument. Comprehensive experiment execution and cross-level analysis are facilitated by run_all.py and analyze_results.py.
Highlighted Details
Maintenance & Community
The provided README does not detail specific maintenance contributors, sponsorships, or community channels such as Discord or Slack.
Licensing & Compatibility
License information is not specified in the provided README text.
Limitations & Caveats
Over-reliance on static benchmarks can create a false sense of security and result in brittle agent capabilities. The adversarial nature of Level 4 necessitates higher computational resources. As a research progression, the project may be subject to ongoing development and potential API changes.
2 months ago
Inactive
facebookresearch
jennyzzt