Diffusion model for video inpainting, excelling in content completeness
Top 64.2% on sourcepulse
DiffuEraser is a diffusion model designed for video inpainting, offering enhanced content completeness and temporal consistency over existing methods. It targets researchers and practitioners in computer vision and video processing seeking advanced tools for video editing and restoration.
How It Works
DiffuEraser employs a UNet-based denoising architecture augmented with a BrushNet branch for feature integration via zero convolution. Temporal attention mechanisms are incorporated into self-attention and cross-attention layers to improve temporal consistency. Prior information is used for initialization and conditioning to reduce artifacts, and expanded temporal receptive fields are leveraged for long-sequence inference.
Quick Start & Requirements
pip install -r requirements.txt
within a Python 3.9.19 Conda environment.python run_diffueraser.py
after setting up weights and configuring input video/mask paths.Highlighted Details
Maintenance & Community
The project is developed by researchers from Tongyi Lab, Alibaba Group. Community interaction is encouraged via GitHub Discussions.
Licensing & Compatibility
Licensed under Apache License 2.0, with the caveat that users must comply with Propainter's license due to its use as a prior model. This may impose restrictions on commercial use or linking with closed-source applications.
Limitations & Caveats
The project relies heavily on large pretrained models, requiring substantial disk space and GPU resources. The licensing of the prior model (Propainter) may introduce compatibility issues for certain commercial or closed-source use cases.
3 months ago
Inactive