rStar by zhentingqi

Research paper for improving small LLM reasoning via mutual reasoning

Created 1 year ago

968 stars

Top 38.1% on SourcePulse

View on GitHub

4 Experts Love This Project

Casper Hansen

Author of AutoAWQ

Pawel Garbacki

Cofounder of Fireworks AI

Jesse Clark

Cofounder of Marqo

Wing Lian

Founder of Axolotl AI

Project Summary

rStar enhances the reasoning capabilities of smaller language models (SLMs) by employing a self-play mutual reasoning approach. This method is designed for researchers and practitioners aiming to improve SLM performance on complex problem-solving tasks without requiring fine-tuning or access to larger, more capable models.

How It Works

rStar decouples reasoning into a generation and discrimination process. A target SLM augments Monte Carlo Tree Search (MCTS) with human-like reasoning actions to create high-quality reasoning trajectories. A second SLM, with similar capabilities, acts as a discriminator to verify these trajectories. Trajectories mutually agreed upon by both models are considered more reliable and correct, leading to improved problem-solving accuracy.

Quick Start & Requirements

Install: Clone the repository and install dependencies.
Prerequisites: Python 3.10, CUDA 12+, latest PyTorch, latest transformers, vllm.
Usage:
- Generator: bash scripts/run_gsm8k_generator.sh --dataset_name <dataset> --model_ckpt <path_to_model>
- Evaluator: python eval_src/do_eval.py --dataset_name <dataset> --exp_dir_path <generator_output_folder>
- Discriminator: bash scripts/run_gsm8k_discriminator.sh --model_ckpt <path_to_discriminator> --root_dir <evaluation_results_folder> --dataset_name <dataset>
Links: Paper: https://arxiv.org/abs/2408.06195

Highlighted Details

Significantly boosts GSM8K accuracy: LLaMA2-7B from 12.51% to 63.91%, Mistral-7B from 36.46% to 81.88%, LLaMA3-8B-Instruct from 74.53% to 91.13%.
Supports multiple reasoning datasets: MATH, GSM8K, GSM8KHARD, STG, SVAMP, MULTIARITH.
Integrates with MCTS for enhanced exploration and verification.
Recommended in Awesome LLM Strawberry (OpenAI o1).

Maintenance & Community

Accepted by ICLR 2025.
Follow-up work: rStar-Math.

Licensing & Compatibility

License: Not explicitly stated in the README.

Limitations & Caveats

The README does not specify the license, which may impact commercial use or closed-source integration. The setup requires specific versions of PyTorch and vllm, and CUDA 12+, potentially posing compatibility challenges.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days