Tree-GRPO  by AMAP-ML

LLM agent reinforcement learning with tree search

Created 3 months ago
262 stars

Top 97.2% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Tree-GRPO introduces a novel tree-search rollout strategy for LLM agent RL, enhancing efficiency over chain-based methods. It enables more effective learning and decision-making with reduced computational budgets, targeting researchers and engineers developing advanced LLM agents. The approach offers superior performance and faster training cycles by optimizing exploration and supervision signals.

How It Works

Tree-GRPO constructs a search tree from ReAct step-level nodes, facilitating rollout sampling over this semantically structured tree. This contrasts with independent, chain-based rollouts, allowing more efficient state-action space exploration and providing a richer, tree-based supervision signal. The core advantage is achieving comparable or superior performance with a fraction of the rollout budget.

Quick Start & Requirements

  • Requires separate Conda environments: treegrpo (Python 3.12.9) and retriever (Python 3.10.13).
  • Key dependencies: PyTorch (2.6.0), vLLM (0.8.5.post1), Flash Attention 2, FAISS-GPU (1.7.3), Transformers, Datasets, Pyserini, FastAPI, Uvicorn.
  • Involves dataset download/processing and launching retrieval servers or Bing API integration.
  • Launch scripts (train_multihopqa_grpo.sh, train_multihopqa_tree_search.sh) provided.
  • Logs tracked via Swanlab.
  • Links: [arXiv](
Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.