DiffuEraser  by lixiaowen-xw

Diffusion model for video inpainting, excelling in content completeness

created 6 months ago
485 stars

Top 64.2% on sourcepulse

GitHubView on GitHub
Project Summary

DiffuEraser is a diffusion model designed for video inpainting, offering enhanced content completeness and temporal consistency over existing methods. It targets researchers and practitioners in computer vision and video processing seeking advanced tools for video editing and restoration.

How It Works

DiffuEraser employs a UNet-based denoising architecture augmented with a BrushNet branch for feature integration via zero convolution. Temporal attention mechanisms are incorporated into self-attention and cross-attention layers to improve temporal consistency. Prior information is used for initialization and conditioning to reduce artifacts, and expanded temporal receptive fields are leveraged for long-sequence inference.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt within a Python 3.9.19 Conda environment.
  • Pretrained Models: Requires downloading models from Hugging Face/ModelScope, including Stable Diffusion v1.5 (over 30GB, essential components ~4GB), PCM_Weights, Propainter, and sd-vae-ft-mse. An optional motion adapter is available for training.
  • Inference: Run python run_diffueraser.py after setting up weights and configuring input video/mask paths.
  • Resources: Inference requires significant GPU memory (12GB for 640x360, 33GB for 1280x720) and time.
  • Docs: Project Page, ModelScope Demo

Highlighted Details

  • Outperforms state-of-the-art Propainter in content completeness and temporal consistency.
  • Integrates BrushNet and Animatediff architectures.
  • Supports training and evaluation pipelines for custom datasets.
  • Inference code released January 2025, training code March 2025.

Maintenance & Community

The project is developed by researchers from Tongyi Lab, Alibaba Group. Community interaction is encouraged via GitHub Discussions.

Licensing & Compatibility

Licensed under Apache License 2.0, with the caveat that users must comply with Propainter's license due to its use as a prior model. This may impose restrictions on commercial use or linking with closed-source applications.

Limitations & Caveats

The project relies heavily on large pretrained models, requiring substantial disk space and GPU resources. The licensing of the prior model (Propainter) may introduce compatibility issues for certain commercial or closed-source use cases.

Health Check
Last commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
93 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.