ASearcher  by inclusionAI

Framework for large-scale reinforcement learning of search agents

Created 1 month ago
408 stars

Top 71.5% on SourcePulse

GitHubView on GitHub
Project Summary

ASearcher is an open-source framework for large-scale online reinforcement learning (RL) training of search agents, aiming to achieve expert-level Search Intelligence. It targets developers and researchers seeking to build high-performance, cost-effective search agents, offering released model weights, training methodologies, and data synthesis pipelines.

How It Works

ASearcher employs a fully asynchronous agentic RL approach, decoupling trajectory collection from model training to eliminate GPU idle time and enable efficient long-horizon RL. It also features a novel prompt-based LLM agent for autonomously generating diverse and challenging QA pairs, enhancing training data quality and complexity. This asynchronous design allows for extended tool calls and token generation per trajectory, leading to more robust agent behavior.

Quick Start & Requirements

  • Evaluation: Requires API keys for Serper and Jina, and test data from Huggingface. Evaluation script provided.
  • Training: Requires a runtime environment (see AReaL tutorial), Serper and Jina API keys. Training commands are provided for 7B models on 16 nodes (recommended) or a single node. 32B model training details are forthcoming.
  • Data Synthesis: Requires downloading Wikipedia 2018 webpages and sampled links, and launching SGLang servers for QwQ-32B and Qwen2.5-72B-instruct models.

Highlighted Details

  • Achieves Avg@4 scores of 52.8, 42.1, and 70.9 on GAIA, xBench-DeepSearch, and Frames respectively, surpassing other open-source agents at the 32B scale.
  • Demonstrates substantial RL improvements of +9.1, +13.4, and +12.0 Avg@4 on GAIA, xBench-DeepSearch, and Frames.
  • Enables long-horizon search with tool calls exceeding 40 rounds and generated tokens surpassing 150k during RL training.
  • Fully open-sourced components include datasets, data synthesis agent, training details, and model weights.

Maintenance & Community

Primary contributors are from the RL Lab at Ant Research and Tsinghua University. Acknowledged assistance from AWorld team and Super Computing Technology (SCT) team at Ant Group. Inspired by Search-o1, Search-R1, and WebAgent.

Licensing & Compatibility

The project is released under a permissive license, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

Details for fine-tuning a QwQ-32B agent are marked as "coming soon." The single-node training for a 7B model is noted as potentially slow.

Health Check
Last Commit

18 hours ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
20
Star History
124 stars in the last 30 days

Explore Similar Projects

Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research) and Will Brown Will Brown(Research Lead at Prime Intellect).

agent-lightning by microsoft

6.0%
2k
Train any AI agent with rollouts and feedback
Created 3 months ago
Updated 2 days ago
Starred by Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), Amanpreet Singh Amanpreet Singh(Cofounder of Contextual AI), and
2 more.

coach by IntelLabs

0%
2k
Reinforcement learning framework for experimentation (discontinued)
Created 8 years ago
Updated 2 years ago
Feedback? Help us improve.