blended-diffusion  by omriav

Image editing technique (research paper)

created 3 years ago
577 stars

Top 56.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for "Blended Diffusion for Text-driven Editing of Natural Images," a method for localized image manipulation using natural language prompts and region masks. It targets researchers and practitioners in computer vision and graphics interested in intuitive, text-guided image editing. The primary benefit is enabling precise, semantic edits on generic images while preserving background realism.

How It Works

The approach combines a pre-trained CLIP model for text-guidance with a diffusion probabilistic model (DDPM) for realistic image generation. It achieves localized editing by spatially blending noised versions of the input image with the text-guided diffusion latent across various noise levels. This "blending" strategy ensures seamless integration of the edited region with the original image content. Augmentations are incorporated into the diffusion process to mitigate adversarial artifacts.

Quick Start & Requirements

  • Install: Create a conda environment (conda create --name blended-diffusion python=3.9, conda activate blended-diffusion) and install dependencies (pip3 install ftfy regex matplotlib lpips kornia opencv-python torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html). Download a pre-trained diffusion model checkpoint.
  • Prerequisites: Python 3.9, PyTorch 1.9.0 with CUDA 11.1, ftfy, regex, matplotlib, lpips, kornia, opencv-python.
  • Run: python main.py -p "rock" -i "input_example/img.png" --mask "input_example/mask.png" --output_path "output"
  • Notes: Generate at least 64 results for best quality; reduce --batch_size if encountering CUDA OOM errors.
  • Links: Code, Follow-up Project

Highlighted Details

  • Enables local (region-based) edits using natural language descriptions and ROI masks.
  • Achieves seamless fusion of edited regions with unchanged image parts via spatial blending across noise levels.
  • Outperforms baselines in realism, background preservation, and text matching, as per CVPR 2022 paper.
  • Supports applications like object addition/removal/alteration, background replacement, and image extrapolation.

Maintenance & Community

The project is the official implementation for a CVPR 2022 paper. The authors mention a follow-up project, "Blended Latent Diffusion," which offers improved results and speed.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

The README suggests generating a large number of results (e.g., 64) to find the best ones, indicating potential variability or sub-optimal results from single runs. The dependency on PyTorch 1.9.0 with CUDA 11.1 may pose compatibility challenges with newer CUDA versions or PyTorch releases.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Feedback? Help us improve.