vid2vid-zero  by baaivision

Video editing research paper using image diffusion

created 2 years ago
355 stars

Top 79.7% on sourcepulse

GitHubView on GitHub
Project Summary

vid2vid-zero enables zero-shot video editing by leveraging pre-trained image diffusion models without requiring video-specific training. It targets researchers and practitioners in computer vision and generative AI who need to modify video content based on textual descriptions. The primary benefit is the ability to edit attributes, subjects, and scenes in real-world videos with high fidelity and temporal consistency.

How It Works

The method employs three core modules: null-text inversion for aligning text prompts with video content, cross-frame modeling for maintaining temporal consistency across video frames, and spatial regularization to preserve the original video's fidelity. It utilizes the dynamic nature of attention mechanisms within diffusion models for bidirectional temporal modeling at inference time, avoiding the need for explicit video training.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python, PyTorch. xformers is highly recommended for performance. Requires pre-trained Stable Diffusion models (v1-4 default).
  • Run: accelerate launch test_vid2vid_zero.py --config path/to/config
  • Demo: Local Gradio demo available via python app.py or online at Hugging Face Spaces.
  • Docs: Hugging Face Spaces Demo

Highlighted Details

  • Zero-shot video editing using off-the-shelf image diffusion models.
  • No video-specific training required.
  • Achieves promising results in editing attributes, subjects, and places.
  • Employs null-text inversion, cross-frame modeling, and spatial regularization.

Maintenance & Community

The project is associated with BAAI Vision Team and ZJU. Contact information for hiring and collaboration is provided.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify any explicit limitations or known bugs. The project appears to be relatively new, with code released in April 2023.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Chenlin Meng Chenlin Meng(Cofounder of Pika), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
1 more.

Tune-A-Video by showlab

0%
4k
Text-to-video generation via diffusion model fine-tuning
created 2 years ago
updated 1 year ago
Feedback? Help us improve.