CrossAttentionControl by bloc97

Image editing via cross-attention control in Stable Diffusion

Created 3 years ago

1,345 stars

Top 29.6% on SourcePulse

3 Experts Love This Project

andreasjansson

Andreas Jansson

Cofounder of Replicate

jiamings

Chief Scientist at Luma AI

osanseviero

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This repository provides an unofficial implementation of Cross-Attention Control for Stable Diffusion, enabling fine-grained image editing by manipulating internal attention maps without requiring masks or model fine-tuning. It targets users seeking more precise control over diffusion models than prompt engineering alone offers, facilitating tasks like object replacement, style transfer, and attribute modification with minimal performance impact.

How It Works

The core mechanism involves modifying the cross-attention maps generated by Stable Diffusion during the inference process. By adjusting the weights of specific tokens within the attention layers, users can influence which parts of the prompt correspond to which visual elements in the generated image. This approach bypasses the need for explicit masking or retraining, offering a more intuitive and efficient editing workflow.

Quick Start & Requirements

Install dependencies via pip install torch transformers diffusers==0.4.1 numpy PIL tqdm difflib.
A Jupyter notebook is provided for usage. A Colab demo is also available.
Requires Python and PyTorch. GPU acceleration is highly recommended for practical use.

Highlighted Details

Enables image inversion to find latent vectors for existing images, allowing them to be edited with cross-attention control.
Includes a finite difference gradient descent method to support higher CFG values during inversion.
Demonstrates capabilities in target replacement, style injection, global editing, and direct token attention control.
Compares favorably to standard prompt editing, which often results in unintended changes to image composition and style.

Maintenance & Community

The project is unofficial and appears to be a single-author effort.
No explicit community channels (Discord, Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. The repository's structure suggests it might be MIT or Apache, but this requires verification.
Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

Compatibility is explicitly tied to diffusers==0.4.1, indicating potential fragility with newer versions.
The effectiveness of edits can be sensitive to prompt phrasing and parameter tuning, requiring experimentation.
The unofficial nature may imply a higher bus factor and less rigorous testing compared to official implementations.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

Awesome-Image-Editing by FudanCVL

Survey of multimodal-guided image editing with diffusion models

Created 1 year ago

Updated 4 months ago

OneReward by bytedance

Unified mask-guided image generation and editing

Created 4 months ago

Updated 3 months ago

Forgedit by witcherofresearch

Text-guided image editor via diffusion model fine-tuning

Created 2 years ago

Updated 1 year ago

StyleKeeper by naver-ai

Text-to-image research paper for stylized generation via visual style prompting

Created 1 year ago

Updated 1 month ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

blended-diffusion by omriav

Image editing technique (research paper)

Created 4 years ago

Updated 1 year ago

Starred by

Robin Rombach

Robin Rombach(Cofounder of Black Forest Labs).

glid-3-xl by Jack000

Latent diffusion model for image generation and editing

Created 3 years ago

Updated 3 years ago

Step1X-Edit by stepfun-ai

Image editing model comparable to closed-source alternatives

Created 8 months ago

Updated 1 week ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

FateZero by ChenyangQiQi

Zero-shot video editor (ICCV 2023 Oral) using attention fusion

Created 2 years ago

Updated 2 years ago

Starred by

Chenlin Meng

Chenlin Meng(Cofounder of Pika).

DiffusionCLIP by gwang-kim

Diffusion model for text-guided image manipulation

Created 4 years ago

Updated 2 years ago

Starred by

Ettore Di Giacinto

Ettore Di Giacinto(Author of LocalAI) and

Simon Willison

Simon Willison(Coauthor of Django).

ml-mgie by apple

Image editing via multimodal LLMs (research paper)

Created 2 years ago

Updated 1 year ago

Starred by

Chenlin Meng

Chenlin Meng(Cofounder of Pika),

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI), and

3 more.

prompt-to-prompt by google

Image editing via prompt manipulation, based on attention control

Created 3 years ago

Updated 1 year ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI),

Alexander Borzunov

Alexander Borzunov(Research Scientist at OpenAI), and

9 more.

StyleCLIP by orpatashnik

Text-driven StyleGAN imagery manipulation via CLIP models

Created 4 years ago

Updated 2 years ago

Feedback? Help us improve.