DanceGRPO  by XueZeyue

Unified RL framework for visual generation

Created 4 months ago
874 stars

Top 41.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DanceGRPO is a unified reinforcement learning (RL) framework for visual generation, supporting various diffusion models like Stable Diffusion, FLUX, HunyuanVideo, SkyReels-I2V, and Qwen-Image. It aims to provide an efficient and scalable approach for researchers and practitioners to enhance visual generation tasks through RL-based fine-tuning.

How It Works

DanceGRPO leverages the GRPO (Proximal Policy Optimization) algorithm within a reinforcement learning paradigm to fine-tune existing visual generation models. It treats the generation process as a sequential decision-making problem, using rewards to guide the model towards desired outputs. This RL-based approach allows for direct optimization of generation quality based on specific metrics or reward functions, offering a novel way to improve visual fidelity and adherence to prompts compared to traditional fine-tuning methods.

Quick Start & Requirements

  • Installation: Run ./env_setup.sh for environment setup.
  • Prerequisites: Requires checkpoints for Stable Diffusion v1.4, FLUX, HPS-v2.1, CLIP H-14, HunyuanVideo, SkyReels-I2V, Qwen2-VL-2B-Instruct, and VideoAlign. Specific download links are provided in the README.
  • Hardware: Training scripts indicate requirements for multiple high-end GPUs (e.g., 8 or 16 H800 GPUs). FLUX LoRA training requires ~20GB VRAM per GPU.
  • Resources: Training FLUX can be completed within 12 hours on 16 H800 GPUs.
  • Docs: Links to example slides, FLUX checkpoints, and FastVideo README are provided.

Highlighted Details

  • Supports Stable Diffusion, FLUX, HunyuanVideo, SkyReels-I2V, and Qwen-Image.
  • Released training scripts for FLUX, Stable Diffusion, HunyuanVideo, SkyReels-I2V, and Qwen-Image.
  • Offers LoRA training for FLUX, reducing VRAM requirements.
  • Provides visualization scripts and reward curve analysis for various models.
  • Supports custom model integration by modifying specific Python scripts.

Maintenance & Community

The project is actively updated with new releases and training scripts. The authors encourage community engagement through issues and direct email contact.

Licensing & Compatibility

The repository does not explicitly state a license. However, it reuses code from FastVideo, diffusers, and DDPO-Pytorch, which have their own licenses. Users should verify compatibility for commercial or closed-source use.

Limitations & Caveats

The setup and training scripts are heavily optimized for specific hardware configurations (e.g., H800 GPUs), potentially limiting accessibility for users with different setups. Some training configurations require a significant number of GPUs (e.g., 16 or 32), and achieving optimal results may depend on specific reward models and hyperparameter tuning, as indicated by the discussion on reward collapse and gradient accumulation.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
14
Star History
229 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.2%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.