R1-Searcher  by RUCAIBox

RL framework for incentivizing LLM search via outcome supervision

created 5 months ago
604 stars

Top 55.0% on sourcepulse

GitHubView on GitHub
Project Summary

R1-searcher enables Large Reasoning Models (LRMs) to effectively invoke and utilize web search for knowledge-intensive tasks. It targets researchers and developers aiming to enhance LLM reasoning capabilities, particularly for multi-hop and time-sensitive questions, by providing a reinforcement learning framework that doesn't require instruction fine-tuning.

How It Works

The project employs a two-stage outcome-supervised reinforcement learning approach. Stage 1 trains the model to invoke search using only format rewards. Stage 2 further refines this by teaching the model to effectively use retrieved information, incorporating both format and answer rewards. This method leverages Reinforce++ as the RL algorithm and relies on carefully designed rewards to guide the learning process, avoiding complex prompt engineering or process supervision.

Quick Start & Requirements

  • Install: Create a conda environment (conda create --name r1-searcher python=3.10.16), activate it, and install dependencies: pip install vllm==0.6.5 packaging ninja flash-attn deepspeed accelerate datasets.
  • Prerequisites: Python 3.10.16, CUDA-enabled GPU (for flash-attn, vllm, and embedding/indexing).
  • Data Prep: Requires downloading and indexing Wikipedia corpus (KILT dataset).
  • Resources: Training involves multiple servers for Ray, reward servers, and model rollouts. Evaluation requires local search setup and potentially online search access.
  • Links: Arxiv, Model Checkpoints, Training Data.

Highlighted Details

  • Achieves significant performance improvements on benchmarks like HotpotQA, 2WikiMultiHopQA, Musique, and Bamboogle, outperforming existing methods and even closed-source models like GPT-4o-mini.
  • Demonstrates strong generalization capabilities to out-of-domain datasets and online search scenarios.
  • LongCoT reasoning after RL is presented as a more efficient scaling method than tree-search approaches.
  • Compatible with both Base LLMs and Chat LLMs, and can train from scratch on Base LLMs.

Maintenance & Community

The project is associated with RUCAIBox, a research group. Contact email provided for questions. No explicit community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

Released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The README notes that the capability of the Base LLMs largely influences whether the model can start training directly from zero. Performance on online search is evaluated for Bamboogle, but broader online search integration capabilities are not extensively detailed.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
117 stars in the last 90 days

Explore Similar Projects

Starred by Jason Liu Jason Liu(Author of Instructor) and Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code).

Search-R1 by PeterGriffinJin

1.1%
3k
RL framework for training LLMs to use search engines
created 5 months ago
updated 3 weeks ago
Feedback? Help us improve.