XBai-o4  by MetaStone-AI

Advanced LLM for complex reasoning

Created 2 months ago
300 stars

Top 88.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

XBai-o4 is an open-source large language model family designed for complex reasoning tasks, targeting researchers and developers seeking high-quality reasoning trajectories. It offers competitive performance on benchmarks like AIME and LiveCodeBench, aiming to provide a cost-effective alternative to proprietary models.

How It Works

XBai-o4 utilizes a novel "reflective generative form" that unifies "Long-CoT Reinforcement Learning" and "Process Reward Learning." This approach enables a single model to perform deep reasoning and select high-quality reasoning paths. By sharing a backbone network between Process Reward Models (PRMs) and policy models, XBai-o4 achieves a 99% reduction in PRM inference cost, leading to faster and more accurate responses.

Quick Start & Requirements

  • Installation: Requires Python 3.10, verl, flash_attn==2.7.4.post1, and other dependencies listed in requirements.txt. Installation is via pip install -e verl and pip install -r requirements.txt.
  • Prerequisites: Conda environment setup is recommended.
  • Resources: Training and evaluation scripts are provided for single-node and multi-node setups using Ray. Model conversion to Huggingface format is supported.
  • Links: ModelCard, Paper

Highlighted Details

  • XBai o4-medium achieves 85.4 on AIME24 and 77.6 on AIME25.
  • Outperforms GPT-4o-0513 and Claude-3.5-Sonnet1022 on AIME benchmarks.
  • Offers three variants: low, medium, and high, with performance scaling accordingly.
  • Supports training and evaluation pipelines, including API endpoints for reward and policy models.

Maintenance & Community

The project is associated with authors from various institutions, as indicated by the citation. Community channels or specific maintainer information are not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. The presence of code and model weights implies an open-source release, but specific terms for commercial use or closed-source linking are not provided.

Limitations & Caveats

The evaluation pipeline requires setting up separate API endpoints for reward and policy models, which adds complexity. Performance on LiveCodeBench v5 is provided for some models, but a full suite of benchmarks is not detailed. The project is presented as a research release, and long-term maintenance is not guaranteed.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
19 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
Created 8 months ago
Updated 3 months ago
Feedback? Help us improve.