Research paper for improving small LLM reasoning via mutual reasoning
Top 39.4% on sourcepulse
rStar enhances the reasoning capabilities of smaller language models (SLMs) by employing a self-play mutual reasoning approach. This method is designed for researchers and practitioners aiming to improve SLM performance on complex problem-solving tasks without requiring fine-tuning or access to larger, more capable models.
How It Works
rStar decouples reasoning into a generation and discrimination process. A target SLM augments Monte Carlo Tree Search (MCTS) with human-like reasoning actions to create high-quality reasoning trajectories. A second SLM, with similar capabilities, acts as a discriminator to verify these trajectories. Trajectories mutually agreed upon by both models are considered more reliable and correct, leading to improved problem-solving accuracy.
Quick Start & Requirements
bash scripts/run_gsm8k_generator.sh --dataset_name <dataset> --model_ckpt <path_to_model>
python eval_src/do_eval.py --dataset_name <dataset> --exp_dir_path <generator_output_folder>
bash scripts/run_gsm8k_discriminator.sh --model_ckpt <path_to_discriminator> --root_dir <evaluation_results_folder> --dataset_name <dataset>
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README does not specify the license, which may impact commercial use or closed-source integration. The setup requires specific versions of PyTorch and vllm, and CUDA 12+, potentially posing compatibility challenges.
6 months ago
1+ week