Awesome-RL-for-LRMs by TsinghuaC3I

RL recipes for reasoning, covering models, datasets, reward design, and optimization

Created 11 months ago

2,340 stars

Top 19.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yiran Wu

Coauthor of AutoGen

Project Summary

This repository is a curated collection of recent advancements in Reinforcement Learning (RL) for reasoning tasks in Large Language Models (LLMs), targeting researchers and engineers in AI. It provides a comprehensive overview of models, datasets, reward designs, optimization methods, and empirical findings, aiming to accelerate progress in developing more capable and efficient AI reasoning systems.

How It Works

The collection focuses on RL techniques applied to LLMs, particularly for enhancing reasoning capabilities across various domains like mathematics, coding, and multimodal understanding. It highlights methods that leverage reward signals, often derived from outcomes or specific rules, to fine-tune LLMs. Key approaches include Proximal Policy Optimization (PPO) and its variants (GRPO, VC-PPO), often without KL divergence penalties, and novel algorithms like PRIME-RL that use implicit, token-level rewards.

Quick Start & Requirements

This is a curated list of projects, not a single installable package. Each project typically requires Python, PyTorch, and Hugging Face Transformers. Specific hardware requirements (e.g., GPUs) and dependencies vary per project. Links to individual project GitHub repositories and Hugging Face models are provided for each entry.

Highlighted Details

Broad Coverage: Encompasses LLM reasoning, multimodal reasoning, and agentic applications.
State-of-the-Art Benchmarks: Many projects claim performance matching or exceeding GPT-4o on benchmarks like AIME, MATH500, and Codeforces.
Efficiency Focus: Explores methods to achieve strong reasoning with smaller models (e.g., 1.5B-7B parameters) and reduced training costs.
Reproducibility: Emphasis on open-source implementations and detailed configuration tables for replication.

Maintenance & Community

The repository is actively updated with recent research (primarily from 2025). It encourages community contributions via pull requests. Specific community channels like Discord or Slack are not explicitly mentioned.

Licensing & Compatibility

The repository itself is likely under a permissive license (e.g., MIT, Apache 2.0), but individual projects linked within it may have different licenses. Users must verify the licensing of each specific model or code implementation for commercial or closed-source use.

Limitations & Caveats

This is a collection of research projects, not a unified framework. Adoption requires evaluating and integrating individual projects, each with its own dependencies, setup complexity, and potential limitations. The rapid pace of development means some projects may be experimental or subject to change.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

62 stars in the last 30 days