Discover and explore top open-source AI tools and projects—updated daily.
china-qijizhifengObservability-driven evolution for coding agents
Top 67.2% on SourcePulse
Summary
Agentic Harness Engineering (AHE) is an open observability system for automatically evolving coding-agent harnesses around a fixed base model. It targets researchers and engineers seeking to enhance agent performance by optimizing system prompts, tool descriptions, implementations, and middleware. AHE significantly boosts agent capabilities, demonstrated by high benchmark pass rates and harnesses that generalize across models.
How It Works
AHE uses an iterative evaluate-analyze-improve loop driven by three observability layers: component tracking (git), experience distillation (Agent Debugger processing traces), and decision support (Evolve Agent proposing evidence-backed edits). Harness components like prompts, tools, and skills are refined based on trace analysis. Each iteration's evaluation falsifies predictions, guiding further refinement and encoding general engineering experience.
Quick Start & Requirements
Requires Python ≥ 3.13, uv, and tmux. Installation: git clone, uv sync. Configure environment variables for LLM/sandbox API keys (e.g., LLM_API_KEY, E2B_API_KEY). Experiments run in E2B sandboxes (SaaS/self-hosted). Pre-build E2B templates: uv run python scripts/build_templates.py --dataset-dir /path/to/dataset -j 16. Launch experiments via ./scripts/evolve.sh configs/experiments/exp-003-simple-code-gpt54.yaml. Datasets can be local paths or referenced via dataset: "<name>@<ver>".
Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels, sponsorships, or active development signals were found in the provided README content.
Licensing & Compatibility
Released under the MIT license, which is permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
The Agent Debugger is only partially open-sourced. SaaS E2B sandbox users must manage concurrent sandbox limits to avoid stalling experiments.
1 day ago
Inactive