self_improving_coding_agent  by MaximeRobeyns

Coding agent framework for autonomous self-improvement

Created 10 months ago
254 stars

Top 99.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This project introduces a self-improving coding agent framework that autonomously refines its own codebase via an iterative evaluation and enhancement loop. It targets AI researchers and developers, offering a novel approach to agent development for continuous, autonomous capability improvement.

How It Works

The core is an iterative loop: the agent evaluates its performance on benchmarks, archives results, then improves its own codebase. This cycle repeats, fostering progressive self-enhancement. This approach enables a continuous development process driven by the agent itself.

Quick Start & Requirements

Setup requires cloning the repository and building a Docker image (make image or make image-mac) for isolated execution. Essential prerequisites include exporting API keys for at least one LLM provider (e.g., OpenAI, Anthropic, Gemini) and potentially Google Cloud credentials for Gemini. Local Python dependencies are installed via pip install -r base_agent/requirements.txt and pip install swebench. Interactive testing uses make int, followed by python -m agent_code.agent --server true -p "<prompt>", visualized at http://localhost:8080. The self-improvement loop runs via runner.py. Configuration is detailed in base_agent/src/config.py.

Highlighted Details

  • Autonomous Self-Improvement: Agent iteratively refines its own code.
  • Docker-based Isolation: Crucial for safety due to shell command execution.
  • Multi-LLM Provider Support: Facilitates experimentation across various models.
  • Web Browsing Capability: Modal integration allows access to external web content.
  • Interactive Visualization: Web interface (http://localhost:8080) shows execution flow and call graph.
  • Structured Output: Experiment results, code, and traces organized in results/.

Maintenance & Community

Authored by Maxime Robeyns, Martin Szummer, and Laurence Aitchison, associated with the ICLR 2025 Workshop on Scaling Self-Improving Foundation Models. No specific community channels or roadmap details are provided.

Licensing & Compatibility

The license type is not specified in the README, requiring clarification for commercial use or closed-source integration. Docker usage suggests Linux/macOS compatibility, with a specific build target for Apple Silicon.

Limitations & Caveats

The "base agent" is minimal, lacking efficient file editing tools, devtools (tree-sitter, LSP), or advanced reasoning structures. Future work includes enhancing benchmark curation, reducing self-improvement variance, and integrating more robust software engineering task capabilities.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 30 days

Explore Similar Projects

Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Zhen Lu Zhen Lu(Cofounder of Runpod), and
1 more.

agents-towards-production by NirDiamant

0.9%
17k
Production-ready GenAI agent tutorials
Created 8 months ago
Updated 3 days ago
Starred by Lilian Weng Lilian Weng(Cofounder of Thinking Machines Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
59 more.

AutoGPT by Significant-Gravitas

0.1%
182k
AI agent platform for building, deploying, and running autonomous workflows
Created 2 years ago
Updated 17 hours ago
Feedback? Help us improve.