Discover and explore top open-source AI tools and projects—updated daily.
WangJingyao07Advanced LLM reinforcement fine-tuning framework
Top 94.0% on SourcePulse
A curated and extensible repository for GRPO (Gated Recurrent Policy Optimization) and its variants, Awesome-GRPO offers researchers and engineers a unified platform for advanced LLM reinforcement fine-tuning. It combines both practical code implementations and a collection of relevant academic papers, streamlining the exploration and application of cutting-edge RL techniques for LLMs. The primary benefit is simplified access to a diverse set of GRPO-based methods for efficient model fine-tuning.
How It Works
The project features a modular codebase designed for concise, switchable implementations of GRPO and its derivatives, allowing users to change optimization strategies with a single flag. It emphasizes practical deployment by integrating with DeepSpeed for efficient distributed training (supporting ZeRO-2/3) and vLLM for high-throughput inference, facilitating scalable LLM fine-tuning workflows. This architecture supports rapid experimentation and the application of various GRPO-style algorithms.
Quick Start & Requirements
git clone https://github.com/WangJingyao07/Awesome-GRPO.gitCUDA_VISIBLE_DEVICES=7 python ref_client.pyCUDA_VISIBLE_DEVICES=2,3,4,5,6 deepspeed train.py --algo grpo (or other variants like dapo).CUDA_VISIBLE_DEVICES. Dependencies include DeepSpeed and vLLM.papers/ directory with collected PDFs and a CODE/ directory for implementations.Highlighted Details
Maintenance & Community
The project shows recent activity with updates logged through late 2025, indicating ongoing development. Specific details on core contributors, community channels (e.g., Discord, Slack), or a public roadmap are not provided in this README.
Licensing & Compatibility
The README does not specify a software license. This absence makes it impossible to determine compatibility for commercial use or closed-source linking without further clarification.
Limitations & Caveats
Several GRPO variants are listed as planned for future releases (marked with ☐) and are not yet implemented in the codebase. These include Pref-GRPO, L2T-GRPO, TreePO, GPO, GiGPO, Flow-GRPO, GRPO-SCS, SGPO, Direct Advantage PO, and Diversity Aware PO.
3 months ago
Inactive
sail-sg
sail-sg
alibaba
google