Discover and explore top open-source AI tools and projects—updated daily.
LeapLabTHUInvestigating RLVR's impact on LLM reasoning
Top 86.1% on SourcePulse
This repository provides code for a paper investigating if Reinforcement Learning with Verifiable Rewards (RLVR) genuinely enhances Large Language Model (LLM) reasoning or merely optimizes existing performance. It targets AI researchers and engineers, offering empirical evidence to guide RLVR application by clarifying its impact on reasoning boundaries versus sampling efficiency.
How It Works
The project evaluates RL-trained LLMs against base models using the pass@k metric across mathematical and coding benchmarks. Analysis reveals RLVR improves sampling efficiency at low 'k' but base models surpass RL-trained ones at larger 'k', suggesting RLVR may limit reasoning capacity rather than expand it. Experiments leverage vLLM for response diversity via seed management and sequential state progression.
Quick Start & Requirements
Evaluation code for DeepCoder and Math tasks is released. Specific installation or execution commands are not detailed, but prerequisites likely include a Python environment, vLLM, and potentially specific LLM checkpoints (e.g., DAPO, Oat-Zero). The primary reference is the arXiv paper: https://arxiv.org/abs/2504.13837.
Highlighted Details
Maintenance & Community
The project originates from Tsinghua University and Shanghai Jiao Tong University. No specific community channels (Discord/Slack) or roadmap links are mentioned in the provided text.
Licensing & Compatibility
The README does not explicitly state the software license. This absence may pose compatibility concerns for commercial use or integration into closed-source projects until clarified.
Limitations & Caveats
The core finding indicates RLVR may not expand LLMs' fundamental reasoning capabilities and could potentially limit them, suggesting a re-evaluation of its application for advancing core reasoning skills. No specific unsupported platforms or known bugs are detailed.
3 weeks ago
Inactive
alibaba