tiny-grpo  by open-thought

Toy GRPO implementation for local RL training of LLMs

Created 11 months ago
314 stars

Top 86.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a minimal, hackable implementation of GRPO (Proximal Policy Optimization with Generalized Reward Shaping) for local reinforcement learning training of language models. It targets researchers and engineers aiming to understand GRPO's mechanics and hyperparameter tuning, enabling experimentation on a single node.

How It Works

The implementation focuses on a simplified GRPO algorithm, allowing users to directly modify and experiment with the training loop in train.py. It leverages existing RL frameworks and libraries for core functionalities, prioritizing clarity and ease of modification over production-ready performance.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Install FlashAttention: pip install flash-attn --no-build-isolation
  • Requires Python 3.12.

Highlighted Details

  • Minimal and hackable GRPO implementation.
  • Designed for local, single-node RL training.
  • Focus on understanding GRPO algorithm and hyperparameters.

Maintenance & Community

No information on contributors, sponsorships, or community channels is provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial or closed-source use is undetermined.

Limitations & Caveats

This is a toy implementation intended for educational purposes and algorithm understanding, not for production deployment or large-scale training. The lack of explicit licensing and community support may pose adoption risks.

Health Check
Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Huber Jeff Huber(Cofounder of Chroma), Omar Khattab Omar Khattab(Coauthor of DSPy, ColBERT; Professor at MIT), and
1 more.

arbor by Ziems

0%
302
Framework for optimizing DSPy programs with RL
Created 10 months ago
Updated 3 days ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
3 more.

ROLL by alibaba

2.3%
3k
RL library for large language models
Created 7 months ago
Updated 21 hours ago
Starred by Eric Zhang Eric Zhang(Founding Engineer at Modal), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
3 more.

tunix by google

0.6%
2k
JAX-native library for efficient LLM post-training
Created 9 months ago
Updated 1 day ago
Feedback? Help us improve.