XBai-o4 by MetaStone-AI

Advanced LLM for complex reasoning

Created 4 months ago

301 stars

Top 88.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Simon Willison

Coauthor of Django

Project Summary

XBai-o4 is an open-source large language model family designed for complex reasoning tasks, targeting researchers and developers seeking high-quality reasoning trajectories. It offers competitive performance on benchmarks like AIME and LiveCodeBench, aiming to provide a cost-effective alternative to proprietary models.

How It Works

XBai-o4 utilizes a novel "reflective generative form" that unifies "Long-CoT Reinforcement Learning" and "Process Reward Learning." This approach enables a single model to perform deep reasoning and select high-quality reasoning paths. By sharing a backbone network between Process Reward Models (PRMs) and policy models, XBai-o4 achieves a 99% reduction in PRM inference cost, leading to faster and more accurate responses.

Quick Start & Requirements

Installation: Requires Python 3.10, verl, flash_attn==2.7.4.post1, and other dependencies listed in requirements.txt. Installation is via pip install -e verl and pip install -r requirements.txt.
Prerequisites: Conda environment setup is recommended.
Resources: Training and evaluation scripts are provided for single-node and multi-node setups using Ray. Model conversion to Huggingface format is supported.
Links: ModelCard, Paper

Highlighted Details

XBai o4-medium achieves 85.4 on AIME24 and 77.6 on AIME25.
Outperforms GPT-4o-0513 and Claude-3.5-Sonnet1022 on AIME benchmarks.
Offers three variants: low, medium, and high, with performance scaling accordingly.
Supports training and evaluation pipelines, including API endpoints for reward and policy models.

Maintenance & Community

The project is associated with authors from various institutions, as indicated by the citation. Community channels or specific maintainer information are not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. The presence of code and model weights implies an open-source release, but specific terms for commercial use or closed-source linking are not provided.

Limitations & Caveats

The evaluation pipeline requires setting up separate API endpoints for reward and policy models, which adds complexity. Performance on LiveCodeBench v5 is provided for some models, but a full suite of benchmarks is not detailed. The project is presented as a research release, and long-term maintenance is not guaranteed.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days