VADER  by mihirp1998

Video diffusion finetuning via reward gradients research paper

created 1 year ago
294 stars

Top 90.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides VADER (Video Diffusion Alignment via Reward Gradients), a method for fine-tuning video diffusion models to align with specific downstream tasks like aesthetic generation or text-video coherence. It targets researchers and developers working with foundational video diffusion models, offering an efficient alternative to supervised fine-tuning by leveraging pre-trained reward models.

How It Works

VADER utilizes dense gradient information from pre-trained reward models (e.g., HPS, PickScore, YOLO) with respect to generated pixels. This allows for efficient learning in complex video generation search spaces, enabling alignment with objectives like aesthetics, text-video similarity, and longer video generation without requiring extensive curated datasets.

Quick Start & Requirements

  • Installation: Requires Conda environment setup per model (VideoCrafter, Open-Sora, ModelScope). PyTorch 2.3.0+ and CUDA 12.1 are recommended. xformers is also a dependency.
  • Prerequisites: Specific base models (VideoCrafter2, Open-Sora v1.2, ModelScope) need to be downloaded or are fetched via Hugging Face. HPSv2 library must be installed.
  • Hardware: Inference for VideoCrafter2 requires ~16GB VRAM. Open-Sora inference needs ~40GB VRAM for 360p resolution. Training for Open-Sora with 360p/2s resolution requires 48GB VRAM. ModelScope training can work with >14GB VRAM, with 4x40GB A100s used for experiments.
  • Links: Website, Demo, arXiv

Highlighted Details

  • Supports fine-tuning of VideoCrafter2, Open-Sora v1.2, and ModelScope text-to-video models.
  • Enables alignment for aesthetic quality, text-video similarity, and longer horizon video generation.
  • Demonstrates more efficient learning in terms of reward queries and compute compared to gradient-free methods.
  • Includes baseline implementations for DPO and DDPO.

Maintenance & Community

The project is associated with authors from institutions like CMU. Links to a website and Hugging Face demo are provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it builds upon other open-source projects, suggesting potential licensing considerations for commercial use.

Limitations & Caveats

Support for Stable Video Diffusion is listed as a planned feature but not yet implemented. The README notes potential issues with fp16 precision for certain Open-Sora configurations, recommending bf16 instead.

Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
22 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

Open-Sora-Plan by PKU-YuanGroup

0.1%
12k
Open-source project aiming to reproduce Sora-like T2V model
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.