Image editing technique (research paper)
Top 56.8% on sourcepulse
This repository provides the official implementation for "Blended Diffusion for Text-driven Editing of Natural Images," a method for localized image manipulation using natural language prompts and region masks. It targets researchers and practitioners in computer vision and graphics interested in intuitive, text-guided image editing. The primary benefit is enabling precise, semantic edits on generic images while preserving background realism.
How It Works
The approach combines a pre-trained CLIP model for text-guidance with a diffusion probabilistic model (DDPM) for realistic image generation. It achieves localized editing by spatially blending noised versions of the input image with the text-guided diffusion latent across various noise levels. This "blending" strategy ensures seamless integration of the edited region with the original image content. Augmentations are incorporated into the diffusion process to mitigate adversarial artifacts.
Quick Start & Requirements
conda create --name blended-diffusion python=3.9
, conda activate blended-diffusion
) and install dependencies (pip3 install ftfy regex matplotlib lpips kornia opencv-python torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
). Download a pre-trained diffusion model checkpoint.ftfy
, regex
, matplotlib
, lpips
, kornia
, opencv-python
.python main.py -p "rock" -i "input_example/img.png" --mask "input_example/mask.png" --output_path "output"
--batch_size
if encountering CUDA OOM errors.Highlighted Details
Maintenance & Community
The project is the official implementation for a CVPR 2022 paper. The authors mention a follow-up project, "Blended Latent Diffusion," which offers improved results and speed.
Licensing & Compatibility
The repository does not explicitly state a license in the README.
Limitations & Caveats
The README suggests generating a large number of results (e.g., 64) to find the best ones, indicating potential variability or sub-optimal results from single runs. The dependency on PyTorch 1.9.0 with CUDA 11.1 may pose compatibility challenges with newer CUDA versions or PyTorch releases.
1 year ago
Inactive