Math/code reasoner models trained with RL
Top 50.2% on sourcepulse
Skywork-OR1 provides a series of powerful math and code reasoning large language models, including specialized math models and general-purpose reasoning models. It targets researchers and developers seeking to advance the state-of-the-art in AI reasoning capabilities, offering strong performance on benchmarks like AIME and LiveCodeBench.
How It Works
The models are trained using large-scale rule-based reinforcement learning, leveraging carefully curated datasets and training recipes. This approach aims to enhance logical deduction and problem-solving abilities in both mathematical and coding domains, distinguishing itself through a multi-stage training pipeline and a novel evaluation metric, Avg@K, for more robust performance assessment.
Quick Start & Requirements
docker pull whatcanyousee/verl:vemlp-th2.4.0-cu124-vllm0.6.3-ray2.10-te2.0-megatron0.11.0-v0.0.6
) or Conda (conda create -n verl python==3.10
, pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
, pip3 install flash-attn --no-build-isolation
).Highlighted Details
Skywork-OR1-Math-7B
achieves 69.8 on AIME24 and 52.3 on AIME25 (Avg@32).Skywork-OR1-32B-Preview
matches DeepSeek-R1's performance on math and coding tasks.Skywork-OR1-7B-Preview
outperforms similarly sized models in math and coding.Maintenance & Community
The project is actively maintained by SkyworkAI. Community resources include a GitHub repository and a Notion blog detailing training recipes and experimental results.
Licensing & Compatibility
The models are trained on top of DeepSeek-R1-Distill models and use a custom fork of the verl
project. Specific licensing details for Skywork-OR1 models are not explicitly stated in the README, but the underlying components may have their own licenses.
Limitations & Caveats
The README mentions "Preview" for some models, indicating they may not be the final release versions. A technical report is also pending release. The project relies on a custom fork of verl
, which might introduce dependencies or divergence from the original verl
project.
1 month ago
1 week