RAVE by RehgLab

Video editing framework (research paper) using diffusion models

Created 2 years ago

314 stars

Top 86.1% on SourcePulse

Project Summary

RAVE is a zero-shot framework for text-guided video editing, designed for researchers and practitioners seeking to modify videos with diffusion models. It enables fast, consistent, and high-quality edits across videos of any length by leveraging pre-trained text-to-image diffusion models without requiring additional training.

How It Works

RAVE employs a novel randomized noise shuffling strategy that capitalizes on spatio-temporal interactions between video frames. This approach allows for temporally consistent video generation at a faster pace than existing methods, while also being memory-efficient for longer videos. The framework supports a wide array of edits, from subtle attribute modifications to significant shape transformations.

Quick Start & Requirements

Installation: Use conda create -n rave python=3.8, conda activate rave, pip install -r requirements.txt, and install PyTorch (2.0.1+cu118) and Xformers (0.0.20).
Prerequisites: Python 3.8, CUDA 11.8, PyTorch 2.0.1, Xformers 0.0.20, diffusers 0.18.2. Tested on Linux.
Demo: Run python webui.py for a Gradio-based web demo.
Inference: Place videos in data/mp4_videos, prepare a config file, and run python scripts/run_experiment.py [PATH OF CONFIG FILE].
Custom Models: Use bash CIVIT_AI/civit_ai.sh [CIVITAI_MODEL_ID] to convert CivitAI models.
Project Webpage: https://rahg.lab/RAVE/

Highlighted Details

Zero-shot framework, compatible with off-the-shelf pre-trained models (e.g., CivitAI).
Supports videos of any length with no restriction on duration.
Achieves fast and temporally consistent video editing.
Offers a standardized dataset for evaluating text-guided video editing methods.

Maintenance & Community

The project is maintained by the RehgLab and initiated by Ozgur Kara. Further questions or discussions can be directed to Ozgur Kara.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The dataset is slated for future release. The README mentions that videos on GitHub are heavily compressed, with full versions available on the project webpage. The code was tested on specific versions of dependencies, and compatibility with other environments may require adjustments.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days