Research paper for AlphaZero-like tree search to guide LLM decoding/training
Top 94.3% on sourcepulse
This repository provides TS-LLM, an AlphaZero-like tree-search framework for guiding Large Language Model (LLM) decoding and training. It is designed for researchers and practitioners looking to enhance LLM performance on complex reasoning tasks through structured search.
How It Works
TS-LLM integrates Monte Carlo Tree Search (MCTS) with LLMs by employing learned policy and value networks. These networks, trained via methods akin to AlphaZero, guide the MCTS exploration of the LLM's output space. This approach allows for more efficient and effective search compared to standard decoding methods, particularly for tasks requiring multi-step reasoning.
Quick Start & Requirements
conda create -n tsllm python==3.10
, conda activate tsllm
, pip install -r requirement.txt
, pip install -e .
transformers
, ctranslate2
(v3.17.1 recommended). Ctranslate2 requires model conversion to C++ backend for accelerated inference.ct2-transformers-converter
). Training involves deepspeed
and accelerate
.Highlighted Details
TEST_NO_TERMINAL
, TEST_WITH_TERMINAL
).Maintenance & Community
The project is associated with the ICML 2024 paper "Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training." Code implementation references lightzero
.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README mentions that SFT training is not conducted for RLHF, and direct utilization of a specific instruct model is assumed. Specific configuration details for testing and iterative updates are referenced within script files, requiring careful examination.
1 year ago
1 week