POLARIS by ChenxinAn-fdu

Scaling RL for advanced reasoning models

Created 5 months ago

641 stars

Top 51.9% on SourcePulse

1 Expert Loves This Project

hiyouga

Author of LLaMA-Factory

Project Summary

POLARIS is an open-source post-training recipe that enhances reasoning capabilities of large language models using reinforcement learning (RL). It targets researchers and developers seeking to improve model performance on complex reasoning tasks, offering significant gains over base models and outperforming leading commercial systems in benchmark evaluations.

How It Works

POLARIS employs a multi-stage RL training process, building upon existing advanced reasoning models like Qwen3. The approach involves careful data filtering and preparation, including a 53K-sample dataset, and fine-tuning with RL to scale performance. This post-training optimization strategy is designed to elevate the reasoning abilities of models without requiring foundational architectural changes.

Quick Start & Requirements

Install via pip: pip install -e ./verl and pip install -e ./.
Prerequisites: transformers==4.51.0, vllm==0.8.4, tensordict==0.6.2. Ensure VLLM_ATTENTION_BACKEND is unset.
Demo and evaluation scripts are provided.
Training requires substantial GPU resources (e.g., 32 H800 GPUs for 10 days for a 4B model).
Official resources: Notion, Hugging Face Models, Hugging Face Dataset.

Highlighted Details

Achieves significant performance improvements on reasoning benchmarks like AIME24 and AIME25.
Outperforms commercial models such as Claude-4-Opus and Grok-3-Beta in reported benchmarks.
Supports models up to 7B parameters, with plans for a Coder version.
Training and evaluation codebase built on Verl, with multi-node training support via Ray.

Maintenance & Community

Developed by HKU NLP Group and Bytedance Seed.
Open-sourced dataset, code, and training details.
Twitter for updates.

Licensing & Compatibility

The repository does not explicitly state a license. The underlying models (Qwen3, Deepseek) have their own licenses. Compatibility for commercial use is not specified.

Limitations & Caveats

Requires significant computational resources for training.
Evaluation suggests using higher temperatures and longer response lengths than default for optimal performance.
The project is presented with "Preview" model releases, indicating potential for ongoing development and changes.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

1

Star History

18 stars in the last 30 days

Explore Similar Projects

awesome-deep-reasoning by modelscope

Collection of resources for reasoning models

Created 10 months ago

Updated 7 months ago

Starred by

Simon Willison

Simon Willison(Coauthor of Django).

XBai-o4 by MetaStone-AI

Advanced LLM for complex reasoning

Created 4 months ago

Updated 3 months ago

open-rs by knoveleng

Reinforcement learning for small LLM reasoning

Created 8 months ago

Updated 1 month ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

ReasonFlux by Gen-Verse

LLM post-training algorithms for data selection, RL, and inference

Created 9 months ago

Updated 2 months ago

One-Shot-RLVR by ypwang61

RL fine-tuning with one training example

Created 7 months ago

Updated 1 week ago

Awesome-RL-based-Reasoning-MLLMs by Sun-Haoyuan23

Curated list for RL-based reasoning in multimodal LLMs

Created 9 months ago

Updated 2 weeks ago

train-deepseek-r1 by FareedKhan-dev

Replicate DeepSeek R1 LLM training from scratch

Created 9 months ago

Updated 8 months ago

Starred by

Yiran Wu

Yiran Wu(Coauthor of AutoGen).

Awesome-RL-for-LRMs by TsinghuaC3I

RL recipes for reasoning, covering models, datasets, reward design, and optimization

Created 8 months ago

Updated 3 weeks ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

Open-Reasoner-Zero by Open-Reasoner-Zero

Open-source RL training for scalable reasoning on base models

Created 9 months ago

Updated 6 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect), and

4 more.

open-thoughts by open-thoughts

Open dataset for training reasoning models

Created 10 months ago

Updated 2 months ago

Starred by

Yiran Wu

Yiran Wu(Coauthor of AutoGen),

Woosuk Kwon

Woosuk Kwon(Coauthor of vLLM), and

5 more.

rllm by rllm-org

Framework for post-training language agents via reinforcement learning

Created 10 months ago

Updated 4 days ago

Starred by

Michael Han

Michael Han(Cofounder of Unsloth),

Sebastian Raschka

Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and

19 more.

DeepSeek-R1 by deepseek-ai

Reasoning models research paper

Created 10 months ago

Updated 5 months ago

Feedback? Help us improve.