FateZero by ChenyangQiQi

Zero-shot video editor (ICCV 2023 Oral) using attention fusion

Created 2 years ago

1,152 stars

Top 33.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Project Summary

FateZero is a zero-shot framework for text-driven video editing, enabling users to modify videos based on textual prompts without requiring per-prompt training or manual masking. It leverages pre-trained diffusion models to achieve consistent structural and motion changes, making it suitable for researchers and practitioners interested in advanced video manipulation.

How It Works

FateZero fuses intermediate attention maps captured during the diffusion model's inversion process to preserve structural and motion information. It further minimizes semantic leakage by blending self-attentions with cross-attention features from the source prompt. A spatial-temporal attention mechanism is introduced into the denoising UNet to ensure frame-to-frame consistency.

Quick Start & Requirements

Install via conda and pip install -r requirements.txt.
Requires CUDA 11, fp16 acceleration, and xformers (recommended for A100/3090 GPUs).
Downloading all data and checkpoints requires ~100GB and takes minutes.
Official Hugging Face demo and Colab notebook are available.

Highlighted Details

Zero-shot framework for text-based video editing.
Preserves structural and motion information via attention map fusion.
Achieves temporal consistency using spatial-temporal attention.
Supports style, attribute, and shape editing.
Demonstrates editing capabilities on various real-world videos.

Maintenance & Community

The project is actively maintained as a codebase for research work. Feedback and discussions are welcomed via GitHub issues. Contact information for key contributors is provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README notes that xformers installation can be unstable. While low-cost settings for 16GB GPUs are provided, performance benchmarks for broader hardware configurations are still being developed.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days