Video editing research paper using image diffusion
Top 79.7% on sourcepulse
vid2vid-zero enables zero-shot video editing by leveraging pre-trained image diffusion models without requiring video-specific training. It targets researchers and practitioners in computer vision and generative AI who need to modify video content based on textual descriptions. The primary benefit is the ability to edit attributes, subjects, and scenes in real-world videos with high fidelity and temporal consistency.
How It Works
The method employs three core modules: null-text inversion for aligning text prompts with video content, cross-frame modeling for maintaining temporal consistency across video frames, and spatial regularization to preserve the original video's fidelity. It utilizes the dynamic nature of attention mechanisms within diffusion models for bidirectional temporal modeling at inference time, avoiding the need for explicit video training.
Quick Start & Requirements
pip install -r requirements.txt
xformers
is highly recommended for performance. Requires pre-trained Stable Diffusion models (v1-4 default).accelerate launch test_vid2vid_zero.py --config path/to/config
python app.py
or online at Hugging Face Spaces.Highlighted Details
Maintenance & Community
The project is associated with BAAI Vision Team and ZJU. Contact information for hiring and collaboration is provided.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README does not specify any explicit limitations or known bugs. The project appears to be relatively new, with code released in April 2023.
2 years ago
1 day