self_improving_coding_agent by MaximeRobeyns

Coding agent framework for autonomous self-improvement

Created 1 year ago

368 stars

Top 76.5% on SourcePulse

Project Summary

Summary

This project introduces a self-improving coding agent framework that autonomously refines its own codebase via an iterative evaluation and enhancement loop. It targets AI researchers and developers, offering a novel approach to agent development for continuous, autonomous capability improvement.

How It Works

The core is an iterative loop: the agent evaluates its performance on benchmarks, archives results, then improves its own codebase. This cycle repeats, fostering progressive self-enhancement. This approach enables a continuous development process driven by the agent itself.

Quick Start & Requirements

Setup requires cloning the repository and building a Docker image (make image or make image-mac) for isolated execution. Essential prerequisites include exporting API keys for at least one LLM provider (e.g., OpenAI, Anthropic, Gemini) and potentially Google Cloud credentials for Gemini. Local Python dependencies are installed via pip install -r base_agent/requirements.txt and pip install swebench. Interactive testing uses make int, followed by python -m agent_code.agent --server true -p "<prompt>", visualized at http://localhost:8080. The self-improvement loop runs via runner.py. Configuration is detailed in base_agent/src/config.py.

Highlighted Details

Autonomous Self-Improvement: Agent iteratively refines its own code.
Docker-based Isolation: Crucial for safety due to shell command execution.
Multi-LLM Provider Support: Facilitates experimentation across various models.
Web Browsing Capability: Modal integration allows access to external web content.
Interactive Visualization: Web interface (http://localhost:8080) shows execution flow and call graph.
Structured Output: Experiment results, code, and traces organized in results/.

Maintenance & Community

Authored by Maxime Robeyns, Martin Szummer, and Laurence Aitchison, associated with the ICLR 2025 Workshop on Scaling Self-Improving Foundation Models. No specific community channels or roadmap details are provided.

Licensing & Compatibility

The license type is not specified in the README, requiring clarification for commercial use or closed-source integration. Docker usage suggests Linux/macOS compatibility, with a specific build target for Apple Silicon.

Limitations & Caveats

The "base agent" is minimal, lacking efficient file editing tools, devtools (tree-sitter, LSP), or advanced reasoning structures. Future work includes enhancing benchmark curation, reducing self-improvement variance, and integrating more robust software engineering task capabilities.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

25 stars in the last 30 days