ReasonFlux  by Gen-Verse

LLM post-training algorithms for data selection, RL, and inference

created 5 months ago
465 stars

Top 66.2% on sourcepulse

GitHubView on GitHub
Project Summary

ReasonFlux introduces a novel template-augmented reasoning paradigm for Large Language Models (LLMs), aiming to enhance performance on complex reasoning tasks. It targets researchers and developers seeking to improve LLM capabilities in areas like mathematics and general question answering, offering a method to scale reasoning abilities through structured thought processes.

How It Works

ReasonFlux employs a hierarchical approach, leveraging "thought templates" to guide LLM reasoning. This involves a "navigator" component that selects appropriate templates from a library based on the problem context, and an "inference" model that executes the reasoning steps guided by these templates. This method allows smaller models to achieve performance comparable to or exceeding larger, more general-purpose models on specific reasoning benchmarks.

Quick Start & Requirements

  • Install: Clone the repository and set up a Conda environment (conda create -n ReasonFlux python==3.9, conda activate ReasonFlux, pip install -r requirements.txt).
  • Prerequisites: Python 3.9, llama-factory for training, lm-evaluation-harness for evaluation, and vllm for inference. Note: Avoid installing flash-attn if using jina-embedding-v3 due to potential conflicts.
  • Resources: Training requires significant GPU resources (e.g., 8x A100 GPUs for a 32B model). Inference with vllm is also resource-intensive.
  • Links: Model Zoo, ReasonFlux-F1 README, LLaMA-Factory.

Highlighted Details

  • ReasonFlux-F1-32B outperforms models like o1-mini and DeepSeek-R1-Distill-32B on MATH500 (96.0 vs 90.0/94.3) and AIME2024 (76.7 vs 56.7/72.6).
  • Supports training and inference for multiple model sizes (7B, 14B, 32B).
  • Utilizes a template library for structured reasoning, with an embedding-based retrieval mechanism.
  • Built upon preliminary works like "Buffer of Thoughts" and "SuperCorrect".

Maintenance & Community

  • The project is associated with the paper "ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates" (arXiv:2502.06772).
  • Recent updates include the release of ReasonFlux-F1 models and associated training/inference code.

Licensing & Compatibility

  • The repository appears to be released under a permissive license, but specific terms are not explicitly detailed in the README. Model weights are available on HuggingFace.

Limitations & Caveats

  • The inference code for ReasonFlux-Zero requires specific paths for navigator, template matcher, and inference models, which need to be provided by the user.
  • Potential dependency conflicts exist, particularly between flash-attn and jina-embedding-v3.
  • Evaluation requires specific setup of the lm-evaluation-harness framework.
Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
3
Star History
91 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.