Discover and explore top open-source AI tools and projects—updated daily.
stanford-iris-labAgent scaffold for terminal LLM evaluation
New!
Top 54.7% on SourcePulse
Summary
This project provides an agent scaffold, Meta-Harness, designed to enhance LLM agent performance within terminal environments, specifically targeting the Terminal-Bench 2.0 benchmark. It offers a significant benefit by reducing initial environment exploration time, making agents more efficient and effective for developers evaluating or deploying them in interactive command-line scenarios.
How It Works
Meta-Harness extends the Terminus-KIRA agent by implementing "environment bootstrapping." Prior to agent execution, it captures a snapshot of the sandbox environment, including the working directory, available tools, and system configurations. This snapshot is then injected into the agent's initial prompt, preemptively providing context that would otherwise require several exploration turns (e.g., ls, which python3), thereby accelerating agent setup and task initiation. The agent's discovery was facilitated through automated harness evolution.
Quick Start & Requirements
pip install harborANTHROPIC_API_KEY environment variable.harbor run --agent-import-path agent:AgentHarness -d terminal-bench@2.0 -m anthropic/claude-opus-4-6 -e runloop -n 20 --n-attempts 5Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project states "More details coming soon," indicating that the current documentation is incomplete. Specific limitations regarding platform compatibility, unsupported features, or known bugs are not detailed.
1 week ago
Inactive