ZeroSearch  by Alibaba-NLP

Research paper on incentivizing LLM search without real search engines

created 2 months ago
1,091 stars

Top 35.5% on sourcepulse

GitHubView on GitHub
Project Summary

ZeroSearch is a reinforcement learning framework designed to enhance the search capabilities of Large Language Models (LLMs) by simulating search interactions during training. It targets researchers and developers aiming to improve LLM performance on information retrieval tasks without incurring real search API costs. The framework allows LLMs to learn to generate relevant and even noisy documents, mimicking real-world search results, and progressively improves their reasoning abilities through a curriculum rollout mechanism.

How It Works

ZeroSearch employs a two-stage approach. First, it uses supervised fine-tuning to transform an LLM into a retrieval module that can generate simulated search results. Second, it utilizes reinforcement learning (REINFORCE, GPRO, PPO) to further incentivize the LLM's search behavior. This simulation-based training allows models to learn from a vast number of "searches" without API costs, and a curriculum learning strategy gradually increases the complexity of retrieval scenarios to foster robust reasoning.

Quick Start & Requirements

  • Installation: Requires conda for environment management. Install dependencies via pip and sglang.
    conda create -n zerosearch python=3.9
    conda activate zerosearch
    pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
    pip install vllm==0.6.3
    pip install wandb
    pip install serpapi
    pip install -e .
    pip3 install flash-attn --no-build-isolation
    pip install sglang[all]
    
  • Prerequisites: Python 3.9, PyTorch 2.4.0 with CUDA 12.1, vLLM 0.6.3, wandb, serpapi, flash-attn, and sglang. Requires a Google Search API key for certain configurations.
  • Data/Models: Download training datasets and simulation LLMs from Hugging Face.
  • Resources: Training requires multiple GPUs (e.g., NUM_GPUS_PER_NODE 4).
  • Docs: https://github.com/Alibaba-NLP/ZeroSearch

Highlighted Details

  • Achieves zero API cost for training search-enhanced LLMs.
  • Outperforms models using real search engines in experiments.
  • Generalizes across different LLM sizes and types (base and instruction-tuned).
  • Supports multiple RL algorithms (REINFORCE, GPRO, PPO) and simulation methods (prompt-based, fine-tuning-based).

Maintenance & Community

The project was released in May 2025. Recent updates include new simulation LLMs, tuning datasets, and RL algorithm support. Contact: sunhao@stu.pku.edu.cn.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This may pose compatibility issues for commercial or closed-source use.

Limitations & Caveats

The project is newly released (May 2025) and may be subject to rapid changes. The lack of a specified license requires clarification for any production use. The setup involves multiple complex dependencies and requires significant GPU resources for training.

Health Check
Last commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
1,108 stars in the last 90 days

Explore Similar Projects

Starred by Jason Liu Jason Liu(Author of Instructor) and Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code).

Search-R1 by PeterGriffinJin

1.1%
3k
RL framework for training LLMs to use search engines
created 5 months ago
updated 3 weeks ago
Feedback? Help us improve.