RLP  by NVlabs

Reinforcement learning pre-training for enhanced reasoning

Created 8 months ago
250 stars

Top 100.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary RLP (Reinforcement Learning Pre-training) addresses LLMs' lack of "thinking" during pre-training. It introduces a novel objective treating Chain-of-Thought (CoT) as an action, rewarded by information gain on the next token. This verifier-free, dense reward mechanism enhances reasoning foundations during pre-training, benefiting researchers and engineers seeking more robust LLMs.

How It Works

RLP reframes pre-training by treating Chain-of-Thought (CoT) generation as an action taken before next-token prediction. This action is rewarded based on the information gain it contributes to predicting the observed next token. This approach provides a dense, verifier-free reward signal directly applicable to standard text pre-training corpora, fundamentally instilling reasoning capabilities early.

Quick Start & Requirements

The official code repository is slated for release soon. Specific installation instructions, dependencies (e.g., Python, CUDA), and hardware prerequisites are not yet detailed. Links to official quick-start guides, documentation, or demos are also unavailable.

Highlighted Details

  • Qwen3 1.7B Base: RLP boosts math/science performance (+19% avg. over base, +17% over CPT). Gains compound post-training.
  • Nemotron Nano 12B v2 Base: Applied for 250M tokens, RLP outperforms base (20T tokens) significantly (+35% avg.), especially in science (+23 pts).
  • Architecture Agnostic: Generalizes across models, including hybrid Mamba-Transformer designs.
  • Efficiency: Delivers performance gains without extra compute or extensive token exposure.

Maintenance & Community

Associated with NVIDIA Corporation, with contributions from Ali Hatamizadeh, Syeda Nahida Akter, Shrimai Prabhumoye, Jan Kautz, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, and Yejin Choi. No community channels or roadmap links are provided.

Licensing & Compatibility

Copyrighted by NVIDIA Corporation (© 2025), all rights reserved. This proprietary licensing likely restricts commercial use or integration into closed-source projects without explicit permission. A standard open-source license is not specified.

Limitations & Caveats

The official implementation code is announced for release soon, meaning the project is not yet available for direct use. The README provides no details on specific hardware requirements, setup procedures, or potential limitations beyond the pending code release.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
4 more.

simpleRL-reason by hkust-nlp

0.1%
4k
RL recipe for reasoning ability in models
Created 1 year ago
Updated 5 months ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Anton Bukov Anton Bukov(Cofounder of 1inch Network), and
3 more.

HRM by sapientinc

0.3%
12k
Hierarchical reasoning for complex tasks
Created 10 months ago
Updated 1 month ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
19 more.

DeepSeek-R1 by deepseek-ai

0.1%
92k
Reasoning models research paper
Created 1 year ago
Updated 11 months ago
Feedback? Help us improve.