DiffuEraser by lixiaowen-xw

Diffusion model for video inpainting, excelling in content completeness

Created 11 months ago

589 stars

Top 55.2% on SourcePulse

Project Summary

DiffuEraser is a diffusion model designed for video inpainting, offering enhanced content completeness and temporal consistency over existing methods. It targets researchers and practitioners in computer vision and video processing seeking advanced tools for video editing and restoration.

How It Works

DiffuEraser employs a UNet-based denoising architecture augmented with a BrushNet branch for feature integration via zero convolution. Temporal attention mechanisms are incorporated into self-attention and cross-attention layers to improve temporal consistency. Prior information is used for initialization and conditioning to reduce artifacts, and expanded temporal receptive fields are leveraged for long-sequence inference.

Quick Start & Requirements

Install: Clone the repository and install dependencies via pip install -r requirements.txt within a Python 3.9.19 Conda environment.
Pretrained Models: Requires downloading models from Hugging Face/ModelScope, including Stable Diffusion v1.5 (over 30GB, essential components ~4GB), PCM_Weights, Propainter, and sd-vae-ft-mse. An optional motion adapter is available for training.
Inference: Run python run_diffueraser.py after setting up weights and configuring input video/mask paths.
Resources: Inference requires significant GPU memory (12GB for 640x360, 33GB for 1280x720) and time.
Docs: Project Page, ModelScope Demo

Highlighted Details

Outperforms state-of-the-art Propainter in content completeness and temporal consistency.
Integrates BrushNet and Animatediff architectures.
Supports training and evaluation pipelines for custom datasets.
Inference code released January 2025, training code March 2025.

Maintenance & Community

The project is developed by researchers from Tongyi Lab, Alibaba Group. Community interaction is encouraged via GitHub Discussions.

Licensing & Compatibility

Licensed under Apache License 2.0, with the caveat that users must comply with Propainter's license due to its use as a prior model. This may impose restrictions on commercial use or linking with closed-source applications.

Limitations & Caveats

The project relies heavily on large pretrained models, requiring substantial disk space and GPU resources. The licensing of the prior model (Propainter) may introduce compatibility issues for certain commercial or closed-source use cases.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days