Awesome-GRPO by WangJingyao07

Advanced LLM reinforcement fine-tuning framework

Created 6 months ago

283 stars

Top 92.2% on SourcePulse

Project Summary

A curated and extensible repository for GRPO (Gated Recurrent Policy Optimization) and its variants, Awesome-GRPO offers researchers and engineers a unified platform for advanced LLM reinforcement fine-tuning. It combines both practical code implementations and a collection of relevant academic papers, streamlining the exploration and application of cutting-edge RL techniques for LLMs. The primary benefit is simplified access to a diverse set of GRPO-based methods for efficient model fine-tuning.

How It Works

The project features a modular codebase designed for concise, switchable implementations of GRPO and its derivatives, allowing users to change optimization strategies with a single flag. It emphasizes practical deployment by integrating with DeepSpeed for efficient distributed training (supporting ZeRO-2/3) and vLLM for high-throughput inference, facilitating scalable LLM fine-tuning workflows. This architecture supports rapid experimentation and the application of various GRPO-style algorithms.

Quick Start & Requirements

Installation: Clone the repository: git clone https://github.com/WangJingyao07/Awesome-GRPO.git
Running:
- Evaluation: CUDA_VISIBLE_DEVICES=7 python ref_client.py
- Training: CUDA_VISIBLE_DEVICES=2,3,4,5,6 deepspeed train.py --algo grpo (or other variants like dapo).
Prerequisites: CUDA-enabled GPU(s) are required, as indicated by CUDA_VISIBLE_DEVICES. Dependencies include DeepSpeed and vLLM.
Resources: The repository includes a papers/ directory with collected PDFs and a CODE/ directory for implementations.

Highlighted Details

Supports distributed fine-tuning via DeepSpeed ZeRO-2/3.
Features efficient LLM inference using the vLLM engine.
Implements KL regularization control and token-level reward processing.
Integrates WandB logging for comprehensive experiment tracking.
Offers code configurations and descriptions for multiple GRPO variants, including GRPO, DAPO, Dr.GRPO, GTPO, and GMPO.

Maintenance & Community

The project shows recent activity with updates logged through late 2025, indicating ongoing development. Specific details on core contributors, community channels (e.g., Discord, Slack), or a public roadmap are not provided in this README.

Licensing & Compatibility

The README does not specify a software license. This absence makes it impossible to determine compatibility for commercial use or closed-source linking without further clarification.

Limitations & Caveats

Several GRPO variants are listed as planned for future releases (marked with ☐) and are not yet implemented in the codebase. These include Pref-GRPO, L2T-GRPO, TreePO, GPO, GiGPO, Flow-GRPO, GRPO-SCS, SGPO, Direct Advantage PO, and Diversity Aware PO.

Awesome-GRPO by WangJingyao07

Explore Similar Projects

GDPO by NVlabs

ToolRL by qiancheng0

es-fine-tuning-paper by VsonicV

tiny-grpo by open-thought

Extra-CoT by Mwie1024

oat by sail-sg

discover by test-time-training

DAPO by BytedTsinghua-SIA

Awesome-LLM-Post-training by mbzuai-oryx

ROLL by alibaba

tunix by google

OpenRLHF by OpenRLHF