Scaling RL for advanced reasoning models
Top 57.2% on SourcePulse
POLARIS is an open-source post-training recipe that enhances reasoning capabilities of large language models using reinforcement learning (RL). It targets researchers and developers seeking to improve model performance on complex reasoning tasks, offering significant gains over base models and outperforming leading commercial systems in benchmark evaluations.
How It Works
POLARIS employs a multi-stage RL training process, building upon existing advanced reasoning models like Qwen3. The approach involves careful data filtering and preparation, including a 53K-sample dataset, and fine-tuning with RL to scale performance. This post-training optimization strategy is designed to elevate the reasoning abilities of models without requiring foundational architectural changes.
Quick Start & Requirements
pip install -e ./verl
and pip install -e ./
.transformers==4.51.0
, vllm==0.8.4
, tensordict==0.6.2
. Ensure VLLM_ATTENTION_BACKEND
is unset.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
23 hours ago
Inactive