ZeroSearch by Alibaba-NLP

Research paper on incentivizing LLM search without real search engines

Created 8 months ago

1,223 stars

Top 32.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Casper Hansen

Author of AutoAWQ

Project Summary

ZeroSearch is a reinforcement learning framework designed to enhance the search capabilities of Large Language Models (LLMs) by simulating search interactions during training. It targets researchers and developers aiming to improve LLM performance on information retrieval tasks without incurring real search API costs. The framework allows LLMs to learn to generate relevant and even noisy documents, mimicking real-world search results, and progressively improves their reasoning abilities through a curriculum rollout mechanism.

How It Works

ZeroSearch employs a two-stage approach. First, it uses supervised fine-tuning to transform an LLM into a retrieval module that can generate simulated search results. Second, it utilizes reinforcement learning (REINFORCE, GPRO, PPO) to further incentivize the LLM's search behavior. This simulation-based training allows models to learn from a vast number of "searches" without API costs, and a curriculum learning strategy gradually increases the complexity of retrieval scenarios to foster robust reasoning.

Quick Start & Requirements

Installation: Requires conda for environment management. Install dependencies via pip and sglang.

conda create -n zerosearch python=3.9
conda activate zerosearch
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip install vllm==0.6.3
pip install wandb
pip install serpapi
pip install -e .
pip3 install flash-attn --no-build-isolation
pip install sglang[all]

Prerequisites: Python 3.9, PyTorch 2.4.0 with CUDA 12.1, vLLM 0.6.3, wandb, serpapi, flash-attn, and sglang. Requires a Google Search API key for certain configurations.
Data/Models: Download training datasets and simulation LLMs from Hugging Face.
Resources: Training requires multiple GPUs (e.g., NUM_GPUS_PER_NODE 4).
Docs: https://github.com/Alibaba-NLP/ZeroSearch

Highlighted Details

Achieves zero API cost for training search-enhanced LLMs.
Outperforms models using real search engines in experiments.
Generalizes across different LLM sizes and types (base and instruction-tuned).
Supports multiple RL algorithms (REINFORCE, GPRO, PPO) and simulation methods (prompt-based, fine-tuning-based).

Maintenance & Community

The project was released in May 2025. Recent updates include new simulation LLMs, tuning datasets, and RL algorithm support. Contact: sunhao@stu.pku.edu.cn.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This may pose compatibility issues for commercial or closed-source use.

Limitations & Caveats

The project is newly released (May 2025) and may be subject to rapid changes. The lack of a specified license requires clarification for any production use. The setup involves multiple complex dependencies and requires significant GPU resources for training.

Health Check

Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days