blended-diffusion by omriav

Image editing technique (research paper)

Created 4 years ago

583 stars

Top 55.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

This repository provides the official implementation for "Blended Diffusion for Text-driven Editing of Natural Images," a method for localized image manipulation using natural language prompts and region masks. It targets researchers and practitioners in computer vision and graphics interested in intuitive, text-guided image editing. The primary benefit is enabling precise, semantic edits on generic images while preserving background realism.

How It Works

The approach combines a pre-trained CLIP model for text-guidance with a diffusion probabilistic model (DDPM) for realistic image generation. It achieves localized editing by spatially blending noised versions of the input image with the text-guided diffusion latent across various noise levels. This "blending" strategy ensures seamless integration of the edited region with the original image content. Augmentations are incorporated into the diffusion process to mitigate adversarial artifacts.

Quick Start & Requirements

Install: Create a conda environment (conda create --name blended-diffusion python=3.9, conda activate blended-diffusion) and install dependencies (pip3 install ftfy regex matplotlib lpips kornia opencv-python torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html). Download a pre-trained diffusion model checkpoint.
Prerequisites: Python 3.9, PyTorch 1.9.0 with CUDA 11.1, ftfy, regex, matplotlib, lpips, kornia, opencv-python.
Run: python main.py -p "rock" -i "input_example/img.png" --mask "input_example/mask.png" --output_path "output"
Notes: Generate at least 64 results for best quality; reduce --batch_size if encountering CUDA OOM errors.
Links: Code, Follow-up Project

Highlighted Details

Enables local (region-based) edits using natural language descriptions and ROI masks.
Achieves seamless fusion of edited regions with unchanged image parts via spatial blending across noise levels.
Outperforms baselines in realism, background preservation, and text matching, as per CVPR 2022 paper.
Supports applications like object addition/removal/alteration, background replacement, and image extrapolation.

Maintenance & Community

The project is the official implementation for a CVPR 2022 paper. The authors mention a follow-up project, "Blended Latent Diffusion," which offers improved results and speed.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

The README suggests generating a large number of results (e.g., 64) to find the best ones, indicating potential variability or sub-optimal results from single runs. The dependency on PyTorch 1.9.0 with CUDA 11.1 may pose compatibility challenges with newer CUDA versions or PyTorch releases.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days