LLM_Tree_Search by waterhorse1

Research paper for AlphaZero-like tree search to guide LLM decoding/training

Created 2 years ago

284 stars

Top 92.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Victor Taelin

Author of Bend, Kind, HVM

Project Summary

This repository provides TS-LLM, an AlphaZero-like tree-search framework for guiding Large Language Model (LLM) decoding and training. It is designed for researchers and practitioners looking to enhance LLM performance on complex reasoning tasks through structured search.

How It Works

TS-LLM integrates Monte Carlo Tree Search (MCTS) with LLMs by employing learned policy and value networks. These networks, trained via methods akin to AlphaZero, guide the MCTS exploration of the LLM's output space. This approach allows for more efficient and effective search compared to standard decoding methods, particularly for tasks requiring multi-step reasoning.

Quick Start & Requirements

Installation: conda create -n tsllm python==3.10, conda activate tsllm, pip install -r requirement.txt, pip install -e .
Prerequisites: Python 3.10, transformers, ctranslate2 (v3.17.1 recommended). Ctranslate2 requires model conversion to C++ backend for accelerated inference.
Resources: Requires conversion of Hugging Face models to Ctranslate2 format (e.g., ct2-transformers-converter). Training involves deepspeed and accelerate.
Links: Hugging Face Models

Highlighted Details

Utilizes Ctranslate2 for significantly faster LLM inference.
Provides pre-trained policy and value networks for GSM8k, Game24, and ProntoQA tasks on Hugging Face.
Supports both supervised fine-tuning (SFT) and reinforcement learning (RLHF) training paradigms.
Implements distinct testing modes for different scenarios (e.g., TEST_NO_TERMINAL, TEST_WITH_TERMINAL).

Maintenance & Community

The project is associated with the ICML 2024 paper "Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training." Code implementation references lightzero.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions that SFT training is not conducted for RLHF, and direct utilization of a specific instruct model is assumed. Specific configuration details for testing and iterative updates are referenced within script files, requiring careful examination.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days