Discover and explore top open-source AI tools and projects—updated daily.
MaximeRobeynsCoding agent framework for autonomous self-improvement
Top 99.0% on SourcePulse
Summary
This project introduces a self-improving coding agent framework that autonomously refines its own codebase via an iterative evaluation and enhancement loop. It targets AI researchers and developers, offering a novel approach to agent development for continuous, autonomous capability improvement.
How It Works
The core is an iterative loop: the agent evaluates its performance on benchmarks, archives results, then improves its own codebase. This cycle repeats, fostering progressive self-enhancement. This approach enables a continuous development process driven by the agent itself.
Quick Start & Requirements
Setup requires cloning the repository and building a Docker image (make image or make image-mac) for isolated execution. Essential prerequisites include exporting API keys for at least one LLM provider (e.g., OpenAI, Anthropic, Gemini) and potentially Google Cloud credentials for Gemini. Local Python dependencies are installed via pip install -r base_agent/requirements.txt and pip install swebench. Interactive testing uses make int, followed by python -m agent_code.agent --server true -p "<prompt>", visualized at http://localhost:8080. The self-improvement loop runs via runner.py. Configuration is detailed in base_agent/src/config.py.
Highlighted Details
http://localhost:8080) shows execution flow and call graph.results/.Maintenance & Community
Authored by Maxime Robeyns, Martin Szummer, and Laurence Aitchison, associated with the ICLR 2025 Workshop on Scaling Self-Improving Foundation Models. No specific community channels or roadmap details are provided.
Licensing & Compatibility
The license type is not specified in the README, requiring clarification for commercial use or closed-source integration. Docker usage suggests Linux/macOS compatibility, with a specific build target for Apple Silicon.
Limitations & Caveats
The "base agent" is minimal, lacking efficient file editing tools, devtools (tree-sitter, LSP), or advanced reasoning structures. Future work includes enhancing benchmark curation, reducing self-improvement variance, and integrating more robust software engineering task capabilities.
10 months ago
Inactive
letta-ai
dagger
NirDiamant
Significant-Gravitas