DeepResearcher by GAIR-NLP

LLM research agent trained with reinforcement learning

Created 10 months ago

703 stars

Top 48.6% on SourcePulse

Project Summary

DeepResearcher provides a framework for end-to-end training of LLM-based research agents using reinforcement learning (RL) in real-world web environments. It aims to enable agents to perform complex research tasks, exhibiting emergent cognitive behaviors like planning, information cross-validation, and self-reflection, benefiting researchers and advanced users seeking automated deep-dive analysis.

How It Works

The framework leverages reinforcement learning to train LLM agents through authentic web search interactions. This approach allows agents to learn complex research strategies directly from real-world data, leading to emergent capabilities such as planning, information synthesis, and self-correction, which are crucial for robust, real-world research tasks.

Quick Start & Requirements

Installation: Clone the repository, create and activate a Conda environment (python=3.10), install PyTorch (torch==2.4.0 with cu124), flash-attn, and the package itself (pip install -e .).
Prerequisites: CUDA 12.4 (for PyTorch), Python 3.10, Ray for distributed training, and API keys for search engines (Serper or Azure Bing) and potentially LLM providers (e.g., Qwen-Plus).
Setup: Requires configuring API keys in scrl/handler/config.yaml and scrl/handler/server_handler.py, starting a Ray head node, and running server handlers before training or evaluation.
Resources: Training and inference involve significant computational resources typical for large language models and RL training.
Links: Hugging Face Hub

Highlighted Details

Achieves up to 28.9 points improvement over prompt engineering baselines and 7.2 points over RAG-based RL agents on open-domain research tasks.
Demonstrates emergent cognitive behaviors including planning, cross-validation, self-reflection, and honesty.
Emphasizes the necessity of end-to-end training in real-world web environments for robust research capabilities.

Maintenance & Community

The project is inspired by Deepseek-R1 and built upon veRL and Search-r1. No specific community links (Discord, Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The citation format suggests it is a research artifact, and usage for commercial or closed-source applications would require clarification.

Limitations & Caveats

The setup process is complex, requiring multiple API keys and specific environment configurations. The project is presented as a research artifact, and its stability, long-term maintenance, and production readiness are not detailed.

DeepResearcher by GAIR-NLP

Explore Similar Projects

LLM-with-RL-papers by floodsung

LLM-Agent-Survey by xinzhel

Awesome-Agent-RL by 0russwest0

Awesome-RL-based-LLM-Reasoning by bruno686

ASearcher by inclusionAI

R1-Searcher by RUCAIBox

Agentic-RAG-R1 by jiangxinke

dr-tulu by rlresearch

ZeroSearch by Alibaba-NLP

Awesome-RL-for-LRMs by TsinghuaC3I

WebThinker by RUC-NLPIR

Awesome-LLM-Post-training by mbzuai-oryx