Awesome-GRPO  by WangJingyao07

Advanced LLM reinforcement fine-tuning framework

Created 6 months ago
283 stars

Top 92.2% on SourcePulse

GitHubView on GitHub
Project Summary

A curated and extensible repository for GRPO (Gated Recurrent Policy Optimization) and its variants, Awesome-GRPO offers researchers and engineers a unified platform for advanced LLM reinforcement fine-tuning. It combines both practical code implementations and a collection of relevant academic papers, streamlining the exploration and application of cutting-edge RL techniques for LLMs. The primary benefit is simplified access to a diverse set of GRPO-based methods for efficient model fine-tuning.

How It Works

The project features a modular codebase designed for concise, switchable implementations of GRPO and its derivatives, allowing users to change optimization strategies with a single flag. It emphasizes practical deployment by integrating with DeepSpeed for efficient distributed training (supporting ZeRO-2/3) and vLLM for high-throughput inference, facilitating scalable LLM fine-tuning workflows. This architecture supports rapid experimentation and the application of various GRPO-style algorithms.

Quick Start & Requirements

  • Installation: Clone the repository: git clone https://github.com/WangJingyao07/Awesome-GRPO.git
  • Running:
    • Evaluation: CUDA_VISIBLE_DEVICES=7 python ref_client.py
    • Training: CUDA_VISIBLE_DEVICES=2,3,4,5,6 deepspeed train.py --algo grpo (or other variants like dapo).
  • Prerequisites: CUDA-enabled GPU(s) are required, as indicated by CUDA_VISIBLE_DEVICES. Dependencies include DeepSpeed and vLLM.
  • Resources: The repository includes a papers/ directory with collected PDFs and a CODE/ directory for implementations.

Highlighted Details

  • Supports distributed fine-tuning via DeepSpeed ZeRO-2/3.
  • Features efficient LLM inference using the vLLM engine.
  • Implements KL regularization control and token-level reward processing.
  • Integrates WandB logging for comprehensive experiment tracking.
  • Offers code configurations and descriptions for multiple GRPO variants, including GRPO, DAPO, Dr.GRPO, GTPO, and GMPO.

Maintenance & Community

The project shows recent activity with updates logged through late 2025, indicating ongoing development. Specific details on core contributors, community channels (e.g., Discord, Slack), or a public roadmap are not provided in this README.

Licensing & Compatibility

The README does not specify a software license. This absence makes it impossible to determine compatibility for commercial use or closed-source linking without further clarification.

Limitations & Caveats

Several GRPO variants are listed as planned for future releases (marked with ☐) and are not yet implemented in the codebase. These include Pref-GRPO, L2T-GRPO, TreePO, GPO, GiGPO, Flow-GRPO, GRPO-SCS, SGPO, Direct Advantage PO, and Diversity Aware PO.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

oat by sail-sg

0.3%
652
LLM online alignment framework for research
Created 1 year ago
Updated 2 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
3 more.

ROLL by alibaba

0.5%
3k
RL library for large language models
Created 11 months ago
Updated 13 hours ago
Starred by Eric Zhang Eric Zhang(Founding Engineer at Modal), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
3 more.

tunix by google

0.5%
2k
JAX-native library for efficient LLM post-training
Created 1 year ago
Updated 7 hours ago
Feedback? Help us improve.