Video editing framework (research paper) using diffusion models
Top 87.8% on sourcepulse
RAVE is a zero-shot framework for text-guided video editing, designed for researchers and practitioners seeking to modify videos with diffusion models. It enables fast, consistent, and high-quality edits across videos of any length by leveraging pre-trained text-to-image diffusion models without requiring additional training.
How It Works
RAVE employs a novel randomized noise shuffling strategy that capitalizes on spatio-temporal interactions between video frames. This approach allows for temporally consistent video generation at a faster pace than existing methods, while also being memory-efficient for longer videos. The framework supports a wide array of edits, from subtle attribute modifications to significant shape transformations.
Quick Start & Requirements
conda create -n rave python=3.8
, conda activate rave
, pip install -r requirements.txt
, and install PyTorch (2.0.1+cu118
) and Xformers (0.0.20
).python webui.py
for a Gradio-based web demo.data/mp4_videos
, prepare a config file, and run python scripts/run_experiment.py [PATH OF CONFIG FILE]
.bash CIVIT_AI/civit_ai.sh [CIVITAI_MODEL_ID]
to convert CivitAI models.Highlighted Details
Maintenance & Community
The project is maintained by the RehgLab and initiated by Ozgur Kara. Further questions or discussions can be directed to Ozgur Kara.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The dataset is slated for future release. The README mentions that videos on GitHub are heavily compressed, with full versions available on the project webpage. The code was tested on specific versions of dependencies, and compatibility with other environments may require adjustments.
5 months ago
1 week