flow_grpo  by yifan123

RL for training flow matching models

created 2 months ago
978 stars

Top 38.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an official implementation of Flow-GRPO, a method for training Flow Matching models using online Reinforcement Learning. It is designed for researchers and practitioners in generative AI, particularly those working with text-to-image diffusion models, offering a flexible framework to incorporate multiple reward signals for enhanced model alignment and quality.

How It Works

Flow-GRPO leverages online Reinforcement Learning to fine-tune Flow Matching models. The core idea is to use a suite of reward models to guide the generation process, optimizing for various quality metrics simultaneously. This approach allows for a more nuanced and controllable generation process compared to single-objective optimization, enabling users to balance different aspects of image quality and alignment through weighted reward combinations.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies using pip install -e . within a conda environment (Python 3.10.16 recommended).
  • Reward Models: Each reward model (GenEval, OCR, PickScore, DeQA, UnifiedReward, etc.) may require separate Conda environments and specific installations (e.g., paddlepaddle-gpu==2.6.2, paddleocr==2.9.1 for OCR).
  • Resources: Training scripts for single-node and multi-node setups are provided. Hyperparameter tuning guidance is available in config/dgx.py.
  • Demos: Online demos are available at Hugging Face: https://huggingface.co/spaces/jieliu/SD3.5-M-Flow-GRPO. Image examples are at https://gongyeliu.github.io/Flow-GRPO.

Highlighted Details

  • Supports training with multiple, weighted reward models (e.g., GenEval, OCR, PickScore, DeQA, UnifiedReward).
  • Offers online RL training for Flow Matching models, a novel approach for this domain.
  • Built upon ddpo-pytorch and diffusers libraries.
  • Provides pre-trained reward models and guidance on deploying them via services like sglang.

Maintenance & Community

The project is based on ddpo-pytorch and diffusers. Further community engagement details (e.g., Discord/Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The README emphasizes the need for separate Conda environments for different reward models due to potential version conflicts, indicating a complex dependency management requirement. The project is presented as an implementation of a research paper, and its production-readiness or long-term maintenance status is not detailed.

Health Check
Last commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
25
Star History
992 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.