tiny-grpo by open-thought

Toy GRPO implementation for local RL training of LLMs

Created 11 months ago

314 stars

Top 86.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Vincent Weisser

Cofounder of Prime Intellect

Luca Antiga

CTO of Lightning AI

Project Summary

This repository provides a minimal, hackable implementation of GRPO (Proximal Policy Optimization with Generalized Reward Shaping) for local reinforcement learning training of language models. It targets researchers and engineers aiming to understand GRPO's mechanics and hyperparameter tuning, enabling experimentation on a single node.

How It Works

The implementation focuses on a simplified GRPO algorithm, allowing users to directly modify and experiment with the training loop in train.py. It leverages existing RL frameworks and libraries for core functionalities, prioritizing clarity and ease of modification over production-ready performance.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Install FlashAttention: pip install flash-attn --no-build-isolation
Requires Python 3.12.

Highlighted Details

Minimal and hackable GRPO implementation.
Designed for local, single-node RL training.
Focus on understanding GRPO algorithm and hyperparameters.

Maintenance & Community

No information on contributors, sponsorships, or community channels is provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial or closed-source use is undetermined.

Limitations & Caveats

This is a toy implementation intended for educational purposes and algorithm understanding, not for production deployment or large-scale training. The lack of explicit licensing and community support may pose adoption risks.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days