ASearcher by inclusionAI

Framework for large-scale reinforcement learning of search agents

Created 6 months ago

556 stars

Top 57.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Project Summary

ASearcher is an open-source framework for large-scale online reinforcement learning (RL) training of search agents, aiming to achieve expert-level Search Intelligence. It targets developers and researchers seeking to build high-performance, cost-effective search agents, offering released model weights, training methodologies, and data synthesis pipelines.

How It Works

ASearcher employs a fully asynchronous agentic RL approach, decoupling trajectory collection from model training to eliminate GPU idle time and enable efficient long-horizon RL. It also features a novel prompt-based LLM agent for autonomously generating diverse and challenging QA pairs, enhancing training data quality and complexity. This asynchronous design allows for extended tool calls and token generation per trajectory, leading to more robust agent behavior.

Quick Start & Requirements

Evaluation: Requires API keys for Serper and Jina, and test data from Huggingface. Evaluation script provided.
Training: Requires a runtime environment (see AReaL tutorial), Serper and Jina API keys. Training commands are provided for 7B models on 16 nodes (recommended) or a single node. 32B model training details are forthcoming.
Data Synthesis: Requires downloading Wikipedia 2018 webpages and sampled links, and launching SGLang servers for QwQ-32B and Qwen2.5-72B-instruct models.

Highlighted Details

Achieves Avg@4 scores of 52.8, 42.1, and 70.9 on GAIA, xBench-DeepSearch, and Frames respectively, surpassing other open-source agents at the 32B scale.
Demonstrates substantial RL improvements of +9.1, +13.4, and +12.0 Avg@4 on GAIA, xBench-DeepSearch, and Frames.
Enables long-horizon search with tool calls exceeding 40 rounds and generated tokens surpassing 150k during RL training.
Fully open-sourced components include datasets, data synthesis agent, training details, and model weights.

Maintenance & Community

Primary contributors are from the RL Lab at Ant Research and Tsinghua University. Acknowledged assistance from AWorld team and Super Computing Technology (SCT) team at Ant Group. Inspired by Search-o1, Search-R1, and WebAgent.

ASearcher by inclusionAI

Explore Similar Projects

MEM1 by MIT-MI

multiagent-coaching by ltjed

Agent_Foundation_Models by OPPO-PersonalAI

DeepResearcher by GAIR-NLP

Agentic-RAG-R1 by jiangxinke

EasyReinforcementLearning by alibaba

tonic by fabiopardo

alf by HorizonRobotics

Agent-R1 by AgentR1

RL-Factory by Simple-Efficient

rl-baselines3-zoo by DLR-RM

agent-lightning by microsoft