DanceGRPO by XueZeyue

Unified RL framework for visual generation

Created 6 months ago

1,288 stars

Top 30.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jiaming Song

Chief Scientist at Luma AI

Project Summary

DanceGRPO is a unified reinforcement learning (RL) framework for visual generation, supporting various diffusion models like Stable Diffusion, FLUX, HunyuanVideo, SkyReels-I2V, and Qwen-Image. It aims to provide an efficient and scalable approach for researchers and practitioners to enhance visual generation tasks through RL-based fine-tuning.

How It Works

DanceGRPO leverages the GRPO (Proximal Policy Optimization) algorithm within a reinforcement learning paradigm to fine-tune existing visual generation models. It treats the generation process as a sequential decision-making problem, using rewards to guide the model towards desired outputs. This RL-based approach allows for direct optimization of generation quality based on specific metrics or reward functions, offering a novel way to improve visual fidelity and adherence to prompts compared to traditional fine-tuning methods.

Quick Start & Requirements

Installation: Run ./env_setup.sh for environment setup.
Prerequisites: Requires checkpoints for Stable Diffusion v1.4, FLUX, HPS-v2.1, CLIP H-14, HunyuanVideo, SkyReels-I2V, Qwen2-VL-2B-Instruct, and VideoAlign. Specific download links are provided in the README.
Hardware: Training scripts indicate requirements for multiple high-end GPUs (e.g., 8 or 16 H800 GPUs). FLUX LoRA training requires ~20GB VRAM per GPU.
Resources: Training FLUX can be completed within 12 hours on 16 H800 GPUs.
Docs: Links to example slides, FLUX checkpoints, and FastVideo README are provided.

Highlighted Details

Supports Stable Diffusion, FLUX, HunyuanVideo, SkyReels-I2V, and Qwen-Image.
Released training scripts for FLUX, Stable Diffusion, HunyuanVideo, SkyReels-I2V, and Qwen-Image.
Offers LoRA training for FLUX, reducing VRAM requirements.
Provides visualization scripts and reward curve analysis for various models.
Supports custom model integration by modifying specific Python scripts.

Maintenance & Community

The project is actively updated with new releases and training scripts. The authors encourage community engagement through issues and direct email contact.

Licensing & Compatibility

The repository does not explicitly state a license. However, it reuses code from FastVideo, diffusers, and DDPO-Pytorch, which have their own licenses. Users should verify compatibility for commercial or closed-source use.

Limitations & Caveats

The setup and training scripts are heavily optimized for specific hardware configurations (e.g., H800 GPUs), potentially limiting accessibility for users with different setups. Some training configurations require a significant number of GPUs (e.g., 16 or 32), and achieving optimal results may depend on specific reward models and hyperparameter tuning, as indicated by the discussion on reward collapse and gradient accumulation.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

160 stars in the last 30 days