DiffuEraser  by lixiaowen-xw

Diffusion model for video inpainting, excelling in content completeness

Created 8 months ago
515 stars

Top 60.8% on SourcePulse

GitHubView on GitHub
Project Summary

DiffuEraser is a diffusion model designed for video inpainting, offering enhanced content completeness and temporal consistency over existing methods. It targets researchers and practitioners in computer vision and video processing seeking advanced tools for video editing and restoration.

How It Works

DiffuEraser employs a UNet-based denoising architecture augmented with a BrushNet branch for feature integration via zero convolution. Temporal attention mechanisms are incorporated into self-attention and cross-attention layers to improve temporal consistency. Prior information is used for initialization and conditioning to reduce artifacts, and expanded temporal receptive fields are leveraged for long-sequence inference.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt within a Python 3.9.19 Conda environment.
  • Pretrained Models: Requires downloading models from Hugging Face/ModelScope, including Stable Diffusion v1.5 (over 30GB, essential components ~4GB), PCM_Weights, Propainter, and sd-vae-ft-mse. An optional motion adapter is available for training.
  • Inference: Run python run_diffueraser.py after setting up weights and configuring input video/mask paths.
  • Resources: Inference requires significant GPU memory (12GB for 640x360, 33GB for 1280x720) and time.
  • Docs: Project Page, ModelScope Demo

Highlighted Details

  • Outperforms state-of-the-art Propainter in content completeness and temporal consistency.
  • Integrates BrushNet and Animatediff architectures.
  • Supports training and evaluation pipelines for custom datasets.
  • Inference code released January 2025, training code March 2025.

Maintenance & Community

The project is developed by researchers from Tongyi Lab, Alibaba Group. Community interaction is encouraged via GitHub Discussions.

Licensing & Compatibility

Licensed under Apache License 2.0, with the caveat that users must comply with Propainter's license due to its use as a prior model. This may impose restrictions on commercial use or linking with closed-source applications.

Limitations & Caveats

The project relies heavily on large pretrained models, requiring substantial disk space and GPU resources. The licensing of the prior model (Propainter) may introduce compatibility issues for certain commercial or closed-source use cases.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
373
Multimodal framework for vision-and-language transformer research
Created 3 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

NExT-GPT by NExT-GPT

0.1%
4k
Any-to-any multimodal LLM research paper
Created 2 years ago
Updated 4 months ago
Feedback? Help us improve.