harness-1 by pat-jj

Search agents trained with reinforcement learning for long-horizon tasks

Created 2 months ago

880 stars

Top 40.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jesse Clark

Cofounder of Marqo

Project Summary

Summary

Harness-1 is a 20-billion parameter search agent designed for long-horizon tasks, leveraging reinforcement learning within a stateful retrieval harness. It addresses the challenge of managing complex search states, enabling agents to make semantic decisions about searching, curating evidence, and verifying claims. This project is targeted at researchers and engineers seeking advanced AI search capabilities, offering a robust framework for stateful, recoverable search operations.

How It Works

The core innovation lies in its stateful retrieval harness, which meticulously maintains recoverable search state, including candidate documents, curated evidence, verification records, and budget-aware context. A reinforcement learning policy governs semantic decisions, dictating search queries, document inspection, evidence curation, and the determination of sufficient evidence. This approach allows for more sophisticated and persistent search trajectories compared to stateless models.

Quick Start & Requirements

For a minimal local smoke test, users need Linux with Python 3.11+, uv installed, and a CUDA-compatible NVIDIA GPU environment with vLLM and GPT-OSS support. The primary installation involves uv sync --extra vllm and setting the HARNESS1_HF_MODEL environment variable to pat-jj/harness-1. Full BrowseComp+ evaluation requires additional setup, including BrowseComp+ data files, a Chroma collection, and OpenAI API credentials. Detailed guides are available in docs/run_vllm_browsecompplus.md.

Highlighted Details

A 20B parameter model trained via reinforcement learning for search tasks.
Features a stateful harness for recoverable and persistent search state management.
Enables agents to make semantic decisions on search actions and evidence curation.
Supports local serving via vLLM and evaluation on the BrowseComp+ benchmark.

Maintenance & Community

The project is associated with authors Pengcheng Jiang, Zhiyi Shi, Kelly Hong, et al. Support and bug reporting are managed through the repository's issue tracker. No specific community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The README does not explicitly state the software license. This lack of clarity may pose a risk for commercial use or integration into closed-source projects, requiring further investigation.

Limitations & Caveats

Full BrowseComp+ evaluation necessitates a compatible Chroma retrieval backend and associated data, which are not bundled with the repository. Results may exhibit variance due to external retrieval and reranking services. Local serving requires a CUDA GPU with sufficient memory, with H100-class hardware being the validated configuration; other GPUs may function but are not guaranteed. Certain training and model export workflows depend on private Tinker checkpoints or hosted services.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

88 stars in the last 30 days