Advanced LLM for complex reasoning
New!
Top 91.5% on SourcePulse
XBai-o4 is an open-source large language model family designed for complex reasoning tasks, targeting researchers and developers seeking high-quality reasoning trajectories. It offers competitive performance on benchmarks like AIME and LiveCodeBench, aiming to provide a cost-effective alternative to proprietary models.
How It Works
XBai-o4 utilizes a novel "reflective generative form" that unifies "Long-CoT Reinforcement Learning" and "Process Reward Learning." This approach enables a single model to perform deep reasoning and select high-quality reasoning paths. By sharing a backbone network between Process Reward Models (PRMs) and policy models, XBai-o4 achieves a 99% reduction in PRM inference cost, leading to faster and more accurate responses.
Quick Start & Requirements
verl
, flash_attn==2.7.4.post1
, and other dependencies listed in requirements.txt
. Installation is via pip install -e verl
and pip install -r requirements.txt
.Highlighted Details
Maintenance & Community
The project is associated with authors from various institutions, as indicated by the citation. Community channels or specific maintainer information are not detailed in the README.
Licensing & Compatibility
The README does not explicitly state a license. The presence of code and model weights implies an open-source release, but specific terms for commercial use or closed-source linking are not provided.
Limitations & Caveats
The evaluation pipeline requires setting up separate API endpoints for reward and policy models, which adds complexity. Performance on LiveCodeBench v5 is provided for some models, but a full suite of benchmarks is not detailed. The project is presented as a research release, and long-term maintenance is not guaranteed.
1 week ago
Inactive