autoagent by kevinrgu

Autonomous agent harness engineering framework

Created 3 months ago

4,524 stars

Top 10.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Travis Fischer

Founder of Agentic

Wing Lian

Founder of Axolotl AI

Magnus Müller

Cofounder of Browser Use

Project Summary

This project addresses the complex and time-consuming process of engineering AI agent harnesses by introducing an autonomous, iterative development loop. It targets engineers and researchers seeking to optimize agent performance without direct manual code modification. The core benefit is enabling AI agents to autonomously build, test, and refine their own harnesses overnight, driven by performance metrics.

How It Works

AutoAgent employs a meta-agent approach where human engineers define the desired agent behavior and engineering loop within a program.md file. This meta-agent then autonomously modifies the primary harness file, agent.py, which contains the agent's configuration, tools, and orchestration logic. The system iteratively runs benchmark tasks defined in the tasks/ directory, evaluates the resulting score, and either keeps or discards the modifications to agent.py, effectively hill-climbing towards optimal performance. This design shifts the programming paradigm from modifying harness code directly to programming the meta-agent's instructions.

Quick Start & Requirements

Primary install/run command: Uses uv for dependency management and docker for environment isolation. Key commands include uv sync, docker build -f Dockerfile.base -t autoagent-base ., and uv run harbor run ....
Non-default prerequisites: Docker, Python 3.10+, uv, and model-provider credentials (e.g., OPENAI_API_KEY).
Links: Mentions Harbor documentation for task format details.

Highlighted Details

Autonomous agent harness development via a meta-agent that modifies agent.py.
Single-file, registry-driven harness design (agent.py) for simplicity and maintainability.
Score-driven iteration loop, where changes are kept only if they improve benchmark performance.
Harbor-compatible task format allows for consistent evaluation across different datasets.
Docker isolation ensures agent execution does not affect the host system.

Maintenance & Community

The project is actively seeking engineers, with contact information provided for inquiries (hello@thirdlayer.inc). Specific community channels or contributor details are not detailed in the README.

Licensing & Compatibility

License type: MIT.
Compatibility notes: The MIT license is highly permissive, allowing for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

Users must manually define and add evaluation tasks to the tasks/ directory. The project appears to be in active development, with a product launch anticipated soon. Regular maintenance is required to clean up accumulating Docker images and containers.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

61 stars in the last 30 days