meta-harness  by stanford-iris-lab

Automated search framework for optimizing model harnesses

Created 1 month ago
971 stars

Top 37.6% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Meta-Harness offers a framework for automating the search and optimization of task-specific model harnesses, which are the components surrounding a base model that manage its interaction with the environment (storage, retrieval, display). It targets researchers and developers aiming to enhance AI system performance through end-to-end harness optimization. The repository includes the core framework and two reference experiments for text classification and Terminal-Bench 2.0.

How It Works

<2-4 sentences on core approach / design (key algorithms, models, data flow, or architectural choices) and why this approach is advantageous or novel.> The framework enables automated search over model harnesses, optimizing elements like memory systems and scaffold evolution. This approach aims to improve the efficiency and effectiveness of AI agents by refining their interaction logic. It features a reusable framework and an onboarding process that leverages a coding assistant to generate domain specifications. Examples assume a "proposer agent" (e.g., Claude Code) requiring a specific wrapper for logging interactions.

Quick Start & Requirements

  • Primary install / run command (pip, Docker, binary, etc.).
    • Utilizes uv sync and uv run for dependency management and execution.
  • Non-default prerequisites and dependencies (GPU, CUDA >= 12, Python 3.12, large dataset, API keys, OS, hardware, etc.).
    • Detailed setup, runtime, and command information resides in subdirectory READMEs.
    • Requires a "proposer agent" wrapper; examples use Claude Code.
  • Estimated setup time or resource footprint.
    • Not specified.
  • If they are present, include links to official quick-start, docs, demo, or other relevant pages.
    • Paper: https://arxiv.org/abs/2603.28052
    • Terminal-Bench 2.0 Artifact: https://github.com/stanford-iris-lab/meta-harness-tbench2-artifact
    • Onboarding Guide: ONBOARDING.md (within the repo)

Highlighted Details

  • Framework for automated search and end-to-end optimization of model harnesses.
  • Reference experiments for text classification (memory-system search) and Terminal-Bench 2.0 (scaffold evolution).
  • Onboarding flow designed to guide adaptation to new domains via a coding assistant.

Maintenance & Community

  • The codebase is a cleaned-up version from the paper and has undergone minimal testing, verified only to run.
  • No community channels (e.g., Discord, Slack) or roadmap are specified in the README.

Licensing & Compatibility

  • The repository's license is not explicitly stated in the README.
  • Commercial use or closed-source linking compatibility is undetermined due to the missing license information.

Limitations & Caveats

<1-3 sentences on caveats: unsupported platforms, missing features, alpha status, known bugs, breaking changes, bus factor, deprecation, etc. Avoid vague non-statements and judgments.> Codebase is a cleaned-up paper version, with testing limited to basic execution verification. Detailed setup and runtime instructions are located in subdirectory READMEs. Adapting to new domains necessitates implementing a "proposer agent" wrapper, with provided examples tailored for Claude Code. Absence of license information prevents assessment of usage restrictions.

Health Check
Last Commit

4 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
256 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.