PRIME by PRIME-RL

Scalable RL solution for advanced reasoning of language models

Created 1 year ago

1,794 stars

Top 23.8% on SourcePulse

View on GitHub

8 Experts Love This Project

Vincent Weisser

Cofounder of Prime Intellect

Lewis Tunstall

Research Engineer at Hugging Face

Elvis Saravia

Founder of DAIR.AI

Philipp Schmid

DevRel at Google DeepMind

and 4 more!

Project Summary

PRIME is an open-source solution for enhancing large language model (LLM) reasoning capabilities through reinforcement learning (RL) with implicit process rewards. It targets researchers and developers aiming to improve LLM performance on complex tasks like math and coding, offering a scalable alternative to imitation learning by providing dense, online-updatable reward signals.

How It Works

PRIME leverages an "Implicit Process Reward Model" (Implicit PRM) trained as an outcome reward model (ORM). This approach avoids the need for explicit process labels, instead learning a Q-function that provides token-level rewards. The Implicit PRM is updated online with outcome verifiers, mitigating distribution shift and scalability issues. PRIME integrates this into an RL framework, where both the policy model and PRM are initialized from a Supervised Fine-Tuned (SFT) model. During RL iterations, rollouts are generated, scored by the PRM and an outcome verifier, and the PRM is updated. Combined outcome and process rewards then update the policy model, often using PPO.

Quick Start & Requirements

Install via pip (dependencies include torch, transformers, vllm, tqdm).
Requires NVIDIA GPUs with CUDA.
Example inference code provided using vLLM for efficient LLM serving.
Official Hugging Face collection: https://huggingface.co/PRIME-RL
Paper: https://arxiv.org/abs/2502.01456

Highlighted Details

Achieves significant improvements (16.7% average, >20% on AMC/AIME) over SFT models.
Outperforms larger models like Llama-3.1-70B-Instruct and GPT-4o on specific reasoning benchmarks.
Demonstrates efficiency, achieving results with 1/10th the data and model resources of comparable models.
Utilizes vLLM for high-throughput inference and extends the veRL framework.

Maintenance & Community

Active development with recent news in March 2025.
Paper released on arXiv; code released January 2025.
Mentions extensions from veRL and usage of Eurus, Qwen2.5-Math, and LiveCodeBench.

Licensing & Compatibility

No explicit license is mentioned in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the license, which is crucial for determining commercial usability. While performance claims are strong, the specific datasets and evaluation methodologies for benchmarks like AIME, MATH-500, and AMC are detailed in the paper, requiring further review for full context.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

15 stars in the last 30 days