Awesome-GRPO  by WangJingyao07

Advanced LLM reinforcement fine-tuning framework

Created 5 months ago
276 stars

Top 94.0% on SourcePulse

GitHubView on GitHub
Project Summary

A curated and extensible repository for GRPO (Gated Recurrent Policy Optimization) and its variants, Awesome-GRPO offers researchers and engineers a unified platform for advanced LLM reinforcement fine-tuning. It combines both practical code implementations and a collection of relevant academic papers, streamlining the exploration and application of cutting-edge RL techniques for LLMs. The primary benefit is simplified access to a diverse set of GRPO-based methods for efficient model fine-tuning.

How It Works

The project features a modular codebase designed for concise, switchable implementations of GRPO and its derivatives, allowing users to change optimization strategies with a single flag. It emphasizes practical deployment by integrating with DeepSpeed for efficient distributed training (supporting ZeRO-2/3) and vLLM for high-throughput inference, facilitating scalable LLM fine-tuning workflows. This architecture supports rapid experimentation and the application of various GRPO-style algorithms.

Quick Start & Requirements

  • Installation: Clone the repository: git clone https://github.com/WangJingyao07/Awesome-GRPO.git
  • Running:
    • Evaluation: CUDA_VISIBLE_DEVICES=7 python ref_client.py
    • Training: CUDA_VISIBLE_DEVICES=2,3,4,5,6 deepspeed train.py --algo grpo (or other variants like dapo).
  • Prerequisites: CUDA-enabled GPU(s) are required, as indicated by CUDA_VISIBLE_DEVICES. Dependencies include DeepSpeed and vLLM.
  • Resources: The repository includes a papers/ directory with collected PDFs and a CODE/ directory for implementations.

Highlighted Details

  • Supports distributed fine-tuning via DeepSpeed ZeRO-2/3.
  • Features efficient LLM inference using the vLLM engine.
  • Implements KL regularization control and token-level reward processing.
  • Integrates WandB logging for comprehensive experiment tracking.
  • Offers code configurations and descriptions for multiple GRPO variants, including GRPO, DAPO, Dr.GRPO, GTPO, and GMPO.

Maintenance & Community

The project shows recent activity with updates logged through late 2025, indicating ongoing development. Specific details on core contributors, community channels (e.g., Discord, Slack), or a public roadmap are not provided in this README.

Licensing & Compatibility

The README does not specify a software license. This absence makes it impossible to determine compatibility for commercial use or closed-source linking without further clarification.

Limitations & Caveats

Several GRPO variants are listed as planned for future releases (marked with ☐) and are not yet implemented in the codebase. These include Pref-GRPO, L2T-GRPO, TreePO, GPO, GiGPO, Flow-GRPO, GRPO-SCS, SGPO, Direct Advantage PO, and Diversity Aware PO.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
166 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

oat by sail-sg

0.1%
638
LLM online alignment framework for research
Created 1 year ago
Updated 1 month ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
3 more.

ROLL by alibaba

1.4%
3k
RL library for large language models
Created 9 months ago
Updated 1 day ago
Starred by Eric Zhang Eric Zhang(Founding Engineer at Modal), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
3 more.

tunix by google

0.5%
2k
JAX-native library for efficient LLM post-training
Created 11 months ago
Updated 21 hours ago
Feedback? Help us improve.