Research paper on incentivizing LLM search without real search engines
Top 35.5% on sourcepulse
ZeroSearch is a reinforcement learning framework designed to enhance the search capabilities of Large Language Models (LLMs) by simulating search interactions during training. It targets researchers and developers aiming to improve LLM performance on information retrieval tasks without incurring real search API costs. The framework allows LLMs to learn to generate relevant and even noisy documents, mimicking real-world search results, and progressively improves their reasoning abilities through a curriculum rollout mechanism.
How It Works
ZeroSearch employs a two-stage approach. First, it uses supervised fine-tuning to transform an LLM into a retrieval module that can generate simulated search results. Second, it utilizes reinforcement learning (REINFORCE, GPRO, PPO) to further incentivize the LLM's search behavior. This simulation-based training allows models to learn from a vast number of "searches" without API costs, and a curriculum learning strategy gradually increases the complexity of retrieval scenarios to foster robust reasoning.
Quick Start & Requirements
conda
for environment management. Install dependencies via pip
and sglang
.
conda create -n zerosearch python=3.9
conda activate zerosearch
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip install vllm==0.6.3
pip install wandb
pip install serpapi
pip install -e .
pip3 install flash-attn --no-build-isolation
pip install sglang[all]
NUM_GPUS_PER_NODE 4
).Highlighted Details
Maintenance & Community
The project was released in May 2025. Recent updates include new simulation LLMs, tuning datasets, and RL algorithm support. Contact: sunhao@stu.pku.edu.cn.
Licensing & Compatibility
The repository does not explicitly state a license in the README. This may pose compatibility issues for commercial or closed-source use.
Limitations & Caveats
The project is newly released (May 2025) and may be subject to rapid changes. The lack of a specified license requires clarification for any production use. The setup involves multiple complex dependencies and requires significant GPU resources for training.
3 weeks ago
Inactive