POLARIS  by ChenxinAn-fdu

Scaling RL for advanced reasoning models

Created 3 months ago
610 stars

Top 53.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

POLARIS is an open-source post-training recipe that enhances reasoning capabilities of large language models using reinforcement learning (RL). It targets researchers and developers seeking to improve model performance on complex reasoning tasks, offering significant gains over base models and outperforming leading commercial systems in benchmark evaluations.

How It Works

POLARIS employs a multi-stage RL training process, building upon existing advanced reasoning models like Qwen3. The approach involves careful data filtering and preparation, including a 53K-sample dataset, and fine-tuning with RL to scale performance. This post-training optimization strategy is designed to elevate the reasoning abilities of models without requiring foundational architectural changes.

Quick Start & Requirements

  • Install via pip: pip install -e ./verl and pip install -e ./.
  • Prerequisites: transformers==4.51.0, vllm==0.8.4, tensordict==0.6.2. Ensure VLLM_ATTENTION_BACKEND is unset.
  • Demo and evaluation scripts are provided.
  • Training requires substantial GPU resources (e.g., 32 H800 GPUs for 10 days for a 4B model).
  • Official resources: Notion, Hugging Face Models, Hugging Face Dataset.

Highlighted Details

  • Achieves significant performance improvements on reasoning benchmarks like AIME24 and AIME25.
  • Outperforms commercial models such as Claude-4-Opus and Grok-3-Beta in reported benchmarks.
  • Supports models up to 7B parameters, with plans for a Coder version.
  • Training and evaluation codebase built on Verl, with multi-node training support via Ray.

Maintenance & Community

  • Developed by HKU NLP Group and Bytedance Seed.
  • Open-sourced dataset, code, and training details.
  • Twitter for updates.

Licensing & Compatibility

  • The repository does not explicitly state a license. The underlying models (Qwen3, Deepseek) have their own licenses. Compatibility for commercial use is not specified.

Limitations & Caveats

  • Requires significant computational resources for training.
  • Evaluation suggests using higher temperatures and longer response lengths than default for optimal performance.
  • The project is presented with "Preview" model releases, indicating potential for ongoing development and changes.
Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
27 stars in the last 30 days

Explore Similar Projects

Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
19 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
Created 8 months ago
Updated 3 months ago
Feedback? Help us improve.