Discover and explore top open-source AI tools and projects—updated daily.
MetaStone-AIAdvanced LLM for complex reasoning
Top 88.5% on SourcePulse
XBai-o4 is an open-source large language model family designed for complex reasoning tasks, targeting researchers and developers seeking high-quality reasoning trajectories. It offers competitive performance on benchmarks like AIME and LiveCodeBench, aiming to provide a cost-effective alternative to proprietary models.
How It Works
XBai-o4 utilizes a novel "reflective generative form" that unifies "Long-CoT Reinforcement Learning" and "Process Reward Learning." This approach enables a single model to perform deep reasoning and select high-quality reasoning paths. By sharing a backbone network between Process Reward Models (PRMs) and policy models, XBai-o4 achieves a 99% reduction in PRM inference cost, leading to faster and more accurate responses.
Quick Start & Requirements
verl, flash_attn==2.7.4.post1, and other dependencies listed in requirements.txt. Installation is via pip install -e verl and pip install -r requirements.txt.Highlighted Details
Maintenance & Community
The project is associated with authors from various institutions, as indicated by the citation. Community channels or specific maintainer information are not detailed in the README.
Licensing & Compatibility
The README does not explicitly state a license. The presence of code and model weights implies an open-source release, but specific terms for commercial use or closed-source linking are not provided.
Limitations & Caveats
The evaluation pipeline requires setting up separate API endpoints for reward and policy models, which adds complexity. Performance on LiveCodeBench v5 is provided for some models, but a full suite of benchmarks is not detailed. The project is presented as a research release, and long-term maintenance is not guaranteed.
3 months ago
Inactive
deepseek-ai