rStar  by zhentingqi

Research paper for improving small LLM reasoning via mutual reasoning

created 1 year ago
953 stars

Top 39.4% on sourcepulse

GitHubView on GitHub
Project Summary

rStar enhances the reasoning capabilities of smaller language models (SLMs) by employing a self-play mutual reasoning approach. This method is designed for researchers and practitioners aiming to improve SLM performance on complex problem-solving tasks without requiring fine-tuning or access to larger, more capable models.

How It Works

rStar decouples reasoning into a generation and discrimination process. A target SLM augments Monte Carlo Tree Search (MCTS) with human-like reasoning actions to create high-quality reasoning trajectories. A second SLM, with similar capabilities, acts as a discriminator to verify these trajectories. Trajectories mutually agreed upon by both models are considered more reliable and correct, leading to improved problem-solving accuracy.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies.
  • Prerequisites: Python 3.10, CUDA 12+, latest PyTorch, latest transformers, vllm.
  • Usage:
    • Generator: bash scripts/run_gsm8k_generator.sh --dataset_name <dataset> --model_ckpt <path_to_model>
    • Evaluator: python eval_src/do_eval.py --dataset_name <dataset> --exp_dir_path <generator_output_folder>
    • Discriminator: bash scripts/run_gsm8k_discriminator.sh --model_ckpt <path_to_discriminator> --root_dir <evaluation_results_folder> --dataset_name <dataset>
  • Links: Paper: https://arxiv.org/abs/2408.06195

Highlighted Details

  • Significantly boosts GSM8K accuracy: LLaMA2-7B from 12.51% to 63.91%, Mistral-7B from 36.46% to 81.88%, LLaMA3-8B-Instruct from 74.53% to 91.13%.
  • Supports multiple reasoning datasets: MATH, GSM8K, GSM8KHARD, STG, SVAMP, MULTIARITH.
  • Integrates with MCTS for enhanced exploration and verification.
  • Recommended in Awesome LLM Strawberry (OpenAI o1).

Maintenance & Community

  • Accepted by ICLR 2025.
  • Follow-up work: rStar-Math.

Licensing & Compatibility

  • License: Not explicitly stated in the README.

Limitations & Caveats

The README does not specify the license, which may impact commercial use or closed-source integration. The setup requires specific versions of PyTorch and vllm, and CUDA 12+, potentially posing compatibility challenges.

Health Check
Last commit

6 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.