openr  by openreasoner

Open-source framework for advanced LLM reasoning

created 9 months ago
1,804 stars

Top 24.4% on sourcepulse

GitHubView on GitHub
Project Summary

OpenR is an open-source framework designed to enhance the reasoning capabilities of Large Language Models (LLMs). It targets researchers and developers working on complex problem-solving tasks, particularly in areas like mathematical reasoning, by providing tools for data generation, policy training, and advanced search strategies. The framework aims to improve LLM performance through process supervision and reinforcement learning techniques.

How It Works

OpenR employs a multi-faceted approach to LLM reasoning. It supports process-supervision data generation using methods like OmegaPRM, enabling models to learn from intermediate reasoning steps rather than just final outcomes. For training, it integrates online policy training with algorithms such as APPO, GRPO, and TPPO, alongside generative and discriminative PRM training. The framework also offers diverse search strategies, including Greedy Search, Best-of-N, Beam Search, MCTS, and rStar, allowing for flexible exploration of reasoning paths.

Quick Start & Requirements

  • Installation: Use Conda to create an environment (conda create -n open_reasoner python=3.10), activate it (conda activate open_reasoner), and install dependencies (pip install -r requirements.txt, pip3 install "fschat[model_worker,webui]", pip install -U pydantic). Additional setup is required for latex2sympy.
  • Prerequisites: Requires downloading specific base models (e.g., Qwen2.5-Math, Mistral variants) from Hugging Face. Configuration involves setting environment variables for model paths and worker counts.
  • Resources: Running inference requires starting LM and RM services. Training involves modifying script parameters for dataset and model paths.
  • Links: Paper, Docs, Demo

Highlighted Details

  • Supports MCTS reasoning and rStar for improved LLM problem-solving.
  • Offers online policy training with APPO, GRPO, and TPPO.
  • Features process-supervision data generation (OmegaPRM) and generative RM training.
  • Includes benchmark results showing competitive performance on reasoning tasks.

Maintenance & Community

The project is maintained by the Openreasoner Team. Contributions are welcomed, with guidance provided. Community engagement is encouraged via WeChat.

Licensing & Compatibility

OpenR is released under the MIT License, which permits commercial use and integration with closed-source projects.

Limitations & Caveats

The README indicates that test-time computation and scaling laws are "TBA" and lists multi-modal reasoning and reasoning in code generation as future TODOs, suggesting these areas may be less mature or incomplete.

Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
49 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.