Research paper on LLM self-training via tree search
Top 52.0% on sourcepulse
This repository implements ReST-MCTS*, a novel self-training framework for Large Language Models (LLMs) that leverages Monte Carlo Tree Search (MCTS) guided by inferred process rewards. It targets researchers and developers aiming to improve LLM reasoning capabilities by generating higher-quality training data without extensive manual annotation.
How It Works
ReST-MCTS* integrates MCTS with reinforcement learning to infer per-step rewards that guide the generation of reasoning traces. By using oracle final answers, it estimates the probability that each step contributes to the correct solution. These inferred rewards serve as value targets for refining the reward model and as a selection criterion for high-quality traces used to train the policy model. This approach automates the collection of valuable training data for self-improvement.
Quick Start & Requirements
pip install -r requirements_mistral.txt
(for Mistral/Llama) or pip install -r requirements_sciglm.txt
(for SciGLM).transformers
library may be needed for certain Hugging Face models.MCTS/task.py
for single questions (e.g., python MCTS/task.py
) or evaluate.py
for benchmark evaluation.Highlighted Details
Maintenance & Community
The project is associated with THUDM (Tsinghua University). Further community engagement details are not explicitly provided in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the provided README excerpt. Compatibility for commercial use or closed-source linking would require clarification of the license.
Limitations & Caveats
The setup requires careful management of Python and transformers
library versions across different model backbones. Specific instructions for training custom value models are provided, but users may need to adapt code for unsupported models.
6 months ago
1+ week