Label-Free-RLVR  by QingyangZhang

Curated papers on label-free reinforcement learning for LLMs

Created 3 months ago
267 stars

Top 95.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository curates research papers on Label-Free Reinforcement Learning with Verifiable Rewards (RLVR) for Large Language Models (LLMs). It serves as a valuable resource for researchers and practitioners exploring methods to train and improve LLMs without relying on external, human-annotated reward signals, focusing on self-supervision and intrinsic motivation.

How It Works

The collection highlights papers that explore various RLVR techniques, including self-play, entropy minimization, confidence maximization, and surrogate signals derived from output format or length. These approaches aim to enable LLMs to learn and refine their reasoning capabilities by generating their own training signals, reducing reliance on costly external supervision.

Highlighted Details

  • Covers a broad spectrum of RLVR strategies, from fully unsupervised methods to those requiring limited data or single examples.
  • Includes papers addressing potential pitfalls and evaluation challenges in RLVR research.
  • Features recent advancements in self-rewarding mechanisms and intrinsic motivation for LLM reasoning.
  • Organized into categories like "RLVR without External Supervision" and "RLVR with Limited Data" for clarity.

Maintenance & Community

This is a curated list of papers, maintained by Qingyang Zhang, Haitao Wu, and Yi Ding. Contributions and suggestions for missed papers are welcomed via issue reporting.

Licensing & Compatibility

The repository itself does not contain code or models; it is a collection of links to research papers. The licensing of individual papers is determined by their respective publishers or preprint servers (e.g., arXiv).

Limitations & Caveats

This repository is a literature collection and does not provide implementation code, datasets, or runnable examples. Users must refer to the individual papers for technical details, prerequisites, and experimental setups.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
6 more.

awesome-o1 by srush

0%
1k
Bibliography for OpenAI's o1 project
Created 11 months ago
Updated 10 months ago
Feedback? Help us improve.