XBai-o4  by MetaStone-AI

Advanced LLM for complex reasoning

created 1 week ago

New!

286 stars

Top 91.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

XBai-o4 is an open-source large language model family designed for complex reasoning tasks, targeting researchers and developers seeking high-quality reasoning trajectories. It offers competitive performance on benchmarks like AIME and LiveCodeBench, aiming to provide a cost-effective alternative to proprietary models.

How It Works

XBai-o4 utilizes a novel "reflective generative form" that unifies "Long-CoT Reinforcement Learning" and "Process Reward Learning." This approach enables a single model to perform deep reasoning and select high-quality reasoning paths. By sharing a backbone network between Process Reward Models (PRMs) and policy models, XBai-o4 achieves a 99% reduction in PRM inference cost, leading to faster and more accurate responses.

Quick Start & Requirements

  • Installation: Requires Python 3.10, verl, flash_attn==2.7.4.post1, and other dependencies listed in requirements.txt. Installation is via pip install -e verl and pip install -r requirements.txt.
  • Prerequisites: Conda environment setup is recommended.
  • Resources: Training and evaluation scripts are provided for single-node and multi-node setups using Ray. Model conversion to Huggingface format is supported.
  • Links: ModelCard, Paper

Highlighted Details

  • XBai o4-medium achieves 85.4 on AIME24 and 77.6 on AIME25.
  • Outperforms GPT-4o-0513 and Claude-3.5-Sonnet1022 on AIME benchmarks.
  • Offers three variants: low, medium, and high, with performance scaling accordingly.
  • Supports training and evaluation pipelines, including API endpoints for reward and policy models.

Maintenance & Community

The project is associated with authors from various institutions, as indicated by the citation. Community channels or specific maintainer information are not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. The presence of code and model weights implies an open-source release, but specific terms for commercial use or closed-source linking are not provided.

Limitations & Caveats

The evaluation pipeline requires setting up separate API endpoints for reward and policy models, which adds complexity. Performance on LiveCodeBench v5 is provided for some models, but a full suite of benchmarks is not detailed. The project is presented as a research release, and long-term maintenance is not guaranteed.

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
288 stars in the last 12 days

Explore Similar Projects

Starred by Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), Daniel Han Daniel Han(Cofounder of Unsloth), and
4 more.

open-instruct by allenai

0.7%
3k
Training codebase for instruction-following language models
created 2 years ago
updated 1 day ago
Feedback? Help us improve.