Image editing via cross-attention control in Stable Diffusion
Top 30.7% on sourcepulse
This repository provides an unofficial implementation of Cross-Attention Control for Stable Diffusion, enabling fine-grained image editing by manipulating internal attention maps without requiring masks or model fine-tuning. It targets users seeking more precise control over diffusion models than prompt engineering alone offers, facilitating tasks like object replacement, style transfer, and attribute modification with minimal performance impact.
How It Works
The core mechanism involves modifying the cross-attention maps generated by Stable Diffusion during the inference process. By adjusting the weights of specific tokens within the attention layers, users can influence which parts of the prompt correspond to which visual elements in the generated image. This approach bypasses the need for explicit masking or retraining, offering a more intuitive and efficient editing workflow.
Quick Start & Requirements
pip install torch transformers diffusers==0.4.1 numpy PIL tqdm difflib
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
diffusers==0.4.1
, indicating potential fragility with newer versions.2 years ago
1 day