Discover and explore top open-source AI tools and projects—updated daily.
ChenxinAn-fduScaling RL for advanced reasoning models
Top 51.9% on SourcePulse
POLARIS is an open-source post-training recipe that enhances reasoning capabilities of large language models using reinforcement learning (RL). It targets researchers and developers seeking to improve model performance on complex reasoning tasks, offering significant gains over base models and outperforming leading commercial systems in benchmark evaluations.
How It Works
POLARIS employs a multi-stage RL training process, building upon existing advanced reasoning models like Qwen3. The approach involves careful data filtering and preparation, including a 53K-sample dataset, and fine-tuning with RL to scale performance. This post-training optimization strategy is designed to elevate the reasoning abilities of models without requiring foundational architectural changes.
Quick Start & Requirements
pip install -e ./verl and pip install -e ./.transformers==4.51.0, vllm==0.8.4, tensordict==0.6.2. Ensure VLLM_ATTENTION_BACKEND is unset.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 month ago
Inactive
open-thoughts
deepseek-ai