Discover and explore top open-source AI tools and projects—updated daily.
AMAP-MLLLM agent reinforcement learning with tree search
Top 97.2% on SourcePulse
Summary
Tree-GRPO introduces a novel tree-search rollout strategy for LLM agent RL, enhancing efficiency over chain-based methods. It enables more effective learning and decision-making with reduced computational budgets, targeting researchers and engineers developing advanced LLM agents. The approach offers superior performance and faster training cycles by optimizing exploration and supervision signals.
How It Works
Tree-GRPO constructs a search tree from ReAct step-level nodes, facilitating rollout sampling over this semantically structured tree. This contrasts with independent, chain-based rollouts, allowing more efficient state-action space exploration and providing a richer, tree-based supervision signal. The core advantage is achieving comparable or superior performance with a fraction of the rollout budget.
Quick Start & Requirements
treegrpo (Python 3.12.9) and retriever (Python 3.10.13).train_multihopqa_grpo.sh, train_multihopqa_tree_search.sh) provided.3 months ago
Inactive
KhoomeiK