tiny-grpo  by open-thought

Toy GRPO implementation for local RL training of LLMs

Created 7 months ago
285 stars

Top 91.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a minimal, hackable implementation of GRPO (Proximal Policy Optimization with Generalized Reward Shaping) for local reinforcement learning training of language models. It targets researchers and engineers aiming to understand GRPO's mechanics and hyperparameter tuning, enabling experimentation on a single node.

How It Works

The implementation focuses on a simplified GRPO algorithm, allowing users to directly modify and experiment with the training loop in train.py. It leverages existing RL frameworks and libraries for core functionalities, prioritizing clarity and ease of modification over production-ready performance.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Install FlashAttention: pip install flash-attn --no-build-isolation
  • Requires Python 3.12.

Highlighted Details

  • Minimal and hackable GRPO implementation.
  • Designed for local, single-node RL training.
  • Focus on understanding GRPO algorithm and hyperparameters.

Maintenance & Community

No information on contributors, sponsorships, or community channels is provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial or closed-source use is undetermined.

Limitations & Caveats

This is a toy implementation intended for educational purposes and algorithm understanding, not for production deployment or large-scale training. The lack of explicit licensing and community support may pose adoption risks.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.