Discover and explore top open-source AI tools and projects—updated daily.
Curated papers on label-free reinforcement learning for LLMs
Top 95.9% on SourcePulse
This repository curates research papers on Label-Free Reinforcement Learning with Verifiable Rewards (RLVR) for Large Language Models (LLMs). It serves as a valuable resource for researchers and practitioners exploring methods to train and improve LLMs without relying on external, human-annotated reward signals, focusing on self-supervision and intrinsic motivation.
How It Works
The collection highlights papers that explore various RLVR techniques, including self-play, entropy minimization, confidence maximization, and surrogate signals derived from output format or length. These approaches aim to enable LLMs to learn and refine their reasoning capabilities by generating their own training signals, reducing reliance on costly external supervision.
Highlighted Details
Maintenance & Community
This is a curated list of papers, maintained by Qingyang Zhang, Haitao Wu, and Yi Ding. Contributions and suggestions for missed papers are welcomed via issue reporting.
Licensing & Compatibility
The repository itself does not contain code or models; it is a collection of links to research papers. The licensing of individual papers is determined by their respective publishers or preprint servers (e.g., arXiv).
Limitations & Caveats
This repository is a literature collection and does not provide implementation code, datasets, or runnable examples. Users must refer to the individual papers for technical details, prerequisites, and experimental setups.
2 months ago
Inactive