flow_grpo by yifan123

RL for training flow matching models

Created 8 months ago

1,860 stars

Top 23.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jiaming Song

Chief Scientist at Luma AI

Project Summary

This repository provides an official implementation of Flow-GRPO, a method for training Flow Matching models using online Reinforcement Learning. It is designed for researchers and practitioners in generative AI, particularly those working with text-to-image diffusion models, offering a flexible framework to incorporate multiple reward signals for enhanced model alignment and quality.

How It Works

Flow-GRPO leverages online Reinforcement Learning to fine-tune Flow Matching models. The core idea is to use a suite of reward models to guide the generation process, optimizing for various quality metrics simultaneously. This approach allows for a more nuanced and controllable generation process compared to single-objective optimization, enabling users to balance different aspects of image quality and alignment through weighted reward combinations.

Quick Start & Requirements

Installation: Clone the repository and install dependencies using pip install -e . within a conda environment (Python 3.10.16 recommended).
Reward Models: Each reward model (GenEval, OCR, PickScore, DeQA, UnifiedReward, etc.) may require separate Conda environments and specific installations (e.g., paddlepaddle-gpu==2.6.2, paddleocr==2.9.1 for OCR).
Resources: Training scripts for single-node and multi-node setups are provided. Hyperparameter tuning guidance is available in config/dgx.py.
Demos: Online demos are available at Hugging Face: https://huggingface.co/spaces/jieliu/SD3.5-M-Flow-GRPO. Image examples are at https://gongyeliu.github.io/Flow-GRPO.

Highlighted Details

Supports training with multiple, weighted reward models (e.g., GenEval, OCR, PickScore, DeQA, UnifiedReward).
Offers online RL training for Flow Matching models, a novel approach for this domain.
Built upon ddpo-pytorch and diffusers libraries.
Provides pre-trained reward models and guidance on deploying them via services like sglang.

Maintenance & Community

The project is based on ddpo-pytorch and diffusers. Further community engagement details (e.g., Discord/Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The README emphasizes the need for separate Conda environments for different reward models due to potential version conflicts, indicating a complex dependency management requirement. The project is presented as an implementation of a research paper, and its production-readiness or long-term maintenance status is not detailed.

flow_grpo by yifan123

Explore Similar Projects

gflownet by alexhernandezgarcia

ReST-MCTS by THUDM

pytorch-rl by navneet-nmk

personal_chatgpt by chunhuizhang

Agent-R1 by 0russwest0

MOSS-RLHF by OpenLMLab

RLHF-Reward-Modeling by RLHFlow

Awesome-RL-for-LRMs by TsinghuaC3I

pfrl by pfnet

RL-Factory by Simple-Efficient

lm-human-preferences by openai

RL4LMs by allenai