vid2vid-zero by baaivision

Video editing research paper using image diffusion

Created 2 years ago

356 stars

Top 78.6% on SourcePulse

Project Summary

vid2vid-zero enables zero-shot video editing by leveraging pre-trained image diffusion models without requiring video-specific training. It targets researchers and practitioners in computer vision and generative AI who need to modify video content based on textual descriptions. The primary benefit is the ability to edit attributes, subjects, and scenes in real-world videos with high fidelity and temporal consistency.

How It Works

The method employs three core modules: null-text inversion for aligning text prompts with video content, cross-frame modeling for maintaining temporal consistency across video frames, and spatial regularization to preserve the original video's fidelity. It utilizes the dynamic nature of attention mechanisms within diffusion models for bidirectional temporal modeling at inference time, avoiding the need for explicit video training.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python, PyTorch. xformers is highly recommended for performance. Requires pre-trained Stable Diffusion models (v1-4 default).
Run: accelerate launch test_vid2vid_zero.py --config path/to/config
Demo: Local Gradio demo available via python app.py or online at Hugging Face Spaces.
Docs: Hugging Face Spaces Demo

Highlighted Details

Zero-shot video editing using off-the-shelf image diffusion models.
No video-specific training required.
Achieves promising results in editing attributes, subjects, and places.
Employs null-text inversion, cross-frame modeling, and spatial regularization.

Maintenance & Community

The project is associated with BAAI Vision Team and ZJU. Contact information for hiring and collaboration is provided.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify any explicit limitations or known bugs. The project appears to be relatively new, with code released in April 2023.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

RAVE by RehgLab

Video editing framework (research paper) using diffusion models

Created 2 years ago

Updated 11 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

Gen-L-Video by G-U-N

Video generation research paper using temporal co-denoising

Created 2 years ago

Updated 2 months ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI).

Magic-Me by Zhen-Dong

Video diffusion for personalized clips

Created 1 year ago

Updated 1 year ago

Pusa-VidGen by Yaofang-Liu

Video diffusion model with vectorized timestep adaptation

Created 9 months ago

Updated 4 months ago

Starred by

Jaret Burkett

Jaret Burkett(Founder of Ostris).

SD-CN-Animation by volotat

Video stylization tool using StableDiffusion and ControlNet

Created 2 years ago

Updated 2 years ago

Allegro by rhymes-ai

Text-to-video model for generating short, high-quality videos

Created 1 year ago

Updated 11 months ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

FateZero by ChenyangQiQi

Zero-shot video editor (ICCV 2023 Oral) using attention fusion

Created 2 years ago

Updated 2 years ago

TokenFlow by omerbt

Framework for consistent video editing using diffusion features (ICLR 2024)

Created 2 years ago

Updated 11 months ago

Rerender_A_Video by williamyang1991

Video-to-video translation framework for zero-shot text-guided video rendering

Created 2 years ago

Updated 1 year ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

VGen by ali-vilab

Video synthesis codebase for state-of-the-art generative models

Created 2 years ago

Updated 1 year ago

Starred by

Chenlin Meng

Chenlin Meng(Cofounder of Pika),

Yoland Yan

Yoland Yan(Cofounder of Comfy Org), and

2 more.

Tune-A-Video by showlab

Text-to-video generation via diffusion model fine-tuning

Created 3 years ago

Updated 2 years ago

Starred by

Alex Yu

Alex Yu(Research Scientist at OpenAI; Cofounder of Luma AI),

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI), and

1 more.

SkyReels-V2 by SkyworkAI

Film generation model for infinite-length videos using diffusion forcing

Created 9 months ago

Updated 5 months ago

Feedback? Help us improve.