VADER  by mihirp1998

Video diffusion finetuning via reward gradients research paper

Created 1 year ago
301 stars

Top 88.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides VADER (Video Diffusion Alignment via Reward Gradients), a method for fine-tuning video diffusion models to align with specific downstream tasks like aesthetic generation or text-video coherence. It targets researchers and developers working with foundational video diffusion models, offering an efficient alternative to supervised fine-tuning by leveraging pre-trained reward models.

How It Works

VADER utilizes dense gradient information from pre-trained reward models (e.g., HPS, PickScore, YOLO) with respect to generated pixels. This allows for efficient learning in complex video generation search spaces, enabling alignment with objectives like aesthetics, text-video similarity, and longer video generation without requiring extensive curated datasets.

Quick Start & Requirements

  • Installation: Requires Conda environment setup per model (VideoCrafter, Open-Sora, ModelScope). PyTorch 2.3.0+ and CUDA 12.1 are recommended. xformers is also a dependency.
  • Prerequisites: Specific base models (VideoCrafter2, Open-Sora v1.2, ModelScope) need to be downloaded or are fetched via Hugging Face. HPSv2 library must be installed.
  • Hardware: Inference for VideoCrafter2 requires ~16GB VRAM. Open-Sora inference needs ~40GB VRAM for 360p resolution. Training for Open-Sora with 360p/2s resolution requires 48GB VRAM. ModelScope training can work with >14GB VRAM, with 4x40GB A100s used for experiments.
  • Links: Website, Demo, arXiv

Highlighted Details

  • Supports fine-tuning of VideoCrafter2, Open-Sora v1.2, and ModelScope text-to-video models.
  • Enables alignment for aesthetic quality, text-video similarity, and longer horizon video generation.
  • Demonstrates more efficient learning in terms of reward queries and compute compared to gradient-free methods.
  • Includes baseline implementations for DPO and DDPO.

Maintenance & Community

The project is associated with authors from institutions like CMU. Links to a website and Hugging Face demo are provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it builds upon other open-source projects, suggesting potential licensing considerations for commercial use.

Limitations & Caveats

Support for Stable Video Diffusion is listed as a planned feature but not yet implemented. The README notes potential issues with fp16 precision for certain Open-Sora configurations, recommending bf16 instead.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
1 more.

SkyReels-V2 by SkyworkAI

3.3%
4k
Film generation model for infinite-length videos using diffusion forcing
Created 5 months ago
Updated 1 month ago
Feedback? Help us improve.