tiny-grpo  by open-thought

Toy GRPO implementation for local RL training of LLMs

created 6 months ago
274 stars

Top 95.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a minimal, hackable implementation of GRPO (Proximal Policy Optimization with Generalized Reward Shaping) for local reinforcement learning training of language models. It targets researchers and engineers aiming to understand GRPO's mechanics and hyperparameter tuning, enabling experimentation on a single node.

How It Works

The implementation focuses on a simplified GRPO algorithm, allowing users to directly modify and experiment with the training loop in train.py. It leverages existing RL frameworks and libraries for core functionalities, prioritizing clarity and ease of modification over production-ready performance.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Install FlashAttention: pip install flash-attn --no-build-isolation
  • Requires Python 3.12.

Highlighted Details

  • Minimal and hackable GRPO implementation.
  • Designed for local, single-node RL training.
  • Focus on understanding GRPO algorithm and hyperparameters.

Maintenance & Community

No information on contributors, sponsorships, or community channels is provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial or closed-source use is undetermined.

Limitations & Caveats

This is a toy implementation intended for educational purposes and algorithm understanding, not for production deployment or large-scale training. The lack of explicit licensing and community support may pose adoption risks.

Health Check
Last commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
57 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

HALOs by ContextualAI

0.2%
873
Library for aligning LLMs using human-aware loss functions
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.