Awesome-RL-based-LLM-Reasoning by bruno686

Resource list for RL-based LLM reasoning

Created 11 months ago

640 stars

Top 52.0% on SourcePulse

Project Summary

This repository curates the latest research, slides, and resources on enhancing Large Language Model (LLM) reasoning capabilities through Reinforcement Learning (RL). It serves as a valuable hub for researchers, engineers, and practitioners seeking to understand and implement RL-driven reasoning techniques in LLMs.

How It Works

The project focuses on various RL paradigms for LLM reasoning, including outcome-based rewards, process-based rewards, and direct policy optimization. It highlights methods that incentivize reasoning, enable self-verification, and improve efficiency through techniques like search algorithms (MCTS, Beam Search) and test-time scaling. The underlying advantage of RL is its ability to optimize LLM behavior through trial and error, leading to more robust and generalized reasoning abilities compared to purely supervised methods.

Quick Start & Requirements

Install: Primarily through cloning the repository and installing Python dependencies. Specific RL algorithms might require libraries like transformers, torch, and potentially specialized RL libraries.
Prerequisites: Python 3.x, PyTorch, Hugging Face Transformers. Specific models or experiments may require significant GPU resources (e.g., 4x4090 mentioned for TinyZero) and potentially large datasets.
Resources: Links to papers, slides, and discussions are provided. Some open-source projects like TinyZero and Unsloth-GRPO offer more direct implementation paths.

Highlighted Details

Comprehensive categorization of papers by reward type (outcome, process) and RL approach (policy-based, value-based).
Inclusion of surveys and discussions on the current state, limitations, and future directions of LLM reasoning.
Links to relevant open-source projects and implementations (e.g., TinyZero, Open-r1, Unsloth-GRPO).
Discussion of key RL algorithms like Q-learning, REINFORCE, PPO, DPO, and GRPO.

Maintenance & Community

The repository is community-driven, encouraging contributions of new papers and resources. Links to Zhihu discussions and specific research groups (e.g., Shanghai AI Lab, DeepMind, Meta) are present, indicating active research in the field.

Licensing & Compatibility

The repository itself does not specify a license. However, it links to various research papers and open-source projects, each with its own licensing. Users must consult the licenses of individual components and cited works for compatibility and usage restrictions.

Limitations & Caveats

This is a curated list of resources, not a runnable framework. Implementing the discussed techniques requires significant expertise in LLMs and RL, along with substantial computational resources. Some papers discuss limitations of current LLM reasoning, such as overconfidence and sensitivity to minor input changes.

Awesome-RL-based-LLM-Reasoning by bruno686

Explore Similar Projects

Label-Free-RLVR by QingyangZhang

LLM-with-RL-papers by floodsung

sweet_rl by facebookresearch

limit-of-RLVR by LeapLabTHU

LLM-RL-Papers by WindyLab

DeepResearcher by GAIR-NLP

Awesome-System2-Reasoning-LLM by zzli2022

Agentic-Reasoning by human-re

PRIME by PRIME-RL

Awesome-RL-for-LRMs by TsinghuaC3I

Awesome-LLM-Post-training by mbzuai-oryx

Logic-RL by Unakar