Video diffusion finetuning via reward gradients research paper
Top 90.9% on sourcepulse
This repository provides VADER (Video Diffusion Alignment via Reward Gradients), a method for fine-tuning video diffusion models to align with specific downstream tasks like aesthetic generation or text-video coherence. It targets researchers and developers working with foundational video diffusion models, offering an efficient alternative to supervised fine-tuning by leveraging pre-trained reward models.
How It Works
VADER utilizes dense gradient information from pre-trained reward models (e.g., HPS, PickScore, YOLO) with respect to generated pixels. This allows for efficient learning in complex video generation search spaces, enabling alignment with objectives like aesthetics, text-video similarity, and longer video generation without requiring extensive curated datasets.
Quick Start & Requirements
xformers
is also a dependency.Highlighted Details
Maintenance & Community
The project is associated with authors from institutions like CMU. Links to a website and Hugging Face demo are provided.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. However, it builds upon other open-source projects, suggesting potential licensing considerations for commercial use.
Limitations & Caveats
Support for Stable Video Diffusion is listed as a planned feature but not yet implemented. The README notes potential issues with fp16 precision for certain Open-Sora configurations, recommending bf16 instead.
4 months ago
Inactive