simple_GRPO  by lsdefine

GRPO implementation for reproducing LLM reasoning, like r1

created 5 months ago
1,227 stars

Top 32.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a simplified implementation of the GRPO (Gated Proximal Policy Optimization) algorithm for Large Language Models, targeting researchers and engineers who need to understand and experiment with RLHF (Reinforcement Learning from Human Feedback) concepts. It aims to reduce GPU memory usage and facilitate rapid iteration on RL training parameters and techniques.

How It Works

The implementation leverages Hugging Face's trl library for its core loss calculation formula. A key architectural choice is the decoupling of the reference model, allowing it to run on separate GPUs or machines. This significantly reduces memory overhead on the training GPU, enabling the training of larger models (e.g., 7B parameters) on more accessible hardware. The project also incorporates optimizations like Triton for loss calculation and vLLM for accelerated inference.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Requires at least two GPUs.
  • Example usage involves running a reference model server on one GPU (CUDA_VISIBLE_DEVICES=7 python ref_server.py) and the training process on others (CUDA_VISIBLE_DEVICES=2,3,4,5,6 deepspeed grpo_vllm_one.py).
  • Official documentation and demo are not explicitly linked, but usage examples are provided.

Highlighted Details

  • Achieves rapid training times, with Qwen2.5-3B and 7B models showing an "Aha moment" within 30 steps on a single A800 GPU.
  • Codebase is intentionally kept simple (approx. 200 lines across 2 files) for ease of understanding and modification.
  • Supports experimental features like regrouping, KL penalty, and parameter tuning.
  • Includes a recent Triton implementation for potential speedups and the reinforce++ algorithm.

Maintenance & Community

The project is led by researchers from Fudan University's KnowledgeWorks Lab. Core development is handled by Ph.D. and Master's students. Community channels like Discord/Slack are not mentioned.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires further investigation for commercial use or closed-source integration.

Limitations & Caveats

The project is described as "simple" and "experimental." Known limitations include potential invalid answer generation due to group imbalances and tight GPU memory requirements for generating long context outputs, which the team is actively addressing.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
3
Star History
231 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

applied-ai by pytorch-labs

0.3%
289
Applied AI experiments and examples for PyTorch
created 2 years ago
updated 2 months ago
Starred by Ying Sheng Ying Sheng(Author of SGLang) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

llm-analysis by cli99

0.2%
441
CLI tool for LLM latency/memory analysis during training/inference
created 2 years ago
updated 3 months ago
Feedback? Help us improve.