CrossAttentionControl  by bloc97

Image editing via cross-attention control in Stable Diffusion

created 2 years ago
1,337 stars

Top 30.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an unofficial implementation of Cross-Attention Control for Stable Diffusion, enabling fine-grained image editing by manipulating internal attention maps without requiring masks or model fine-tuning. It targets users seeking more precise control over diffusion models than prompt engineering alone offers, facilitating tasks like object replacement, style transfer, and attribute modification with minimal performance impact.

How It Works

The core mechanism involves modifying the cross-attention maps generated by Stable Diffusion during the inference process. By adjusting the weights of specific tokens within the attention layers, users can influence which parts of the prompt correspond to which visual elements in the generated image. This approach bypasses the need for explicit masking or retraining, offering a more intuitive and efficient editing workflow.

Quick Start & Requirements

  • Install dependencies via pip install torch transformers diffusers==0.4.1 numpy PIL tqdm difflib.
  • A Jupyter notebook is provided for usage. A Colab demo is also available.
  • Requires Python and PyTorch. GPU acceleration is highly recommended for practical use.

Highlighted Details

  • Enables image inversion to find latent vectors for existing images, allowing them to be edited with cross-attention control.
  • Includes a finite difference gradient descent method to support higher CFG values during inversion.
  • Demonstrates capabilities in target replacement, style injection, global editing, and direct token attention control.
  • Compares favorably to standard prompt editing, which often results in unintended changes to image composition and style.

Maintenance & Community

  • The project is unofficial and appears to be a single-author effort.
  • No explicit community channels (Discord, Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The repository's structure suggests it might be MIT or Apache, but this requires verification.
  • Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

  • Compatibility is explicitly tied to diffusers==0.4.1, indicating potential fragility with newer versions.
  • The effectiveness of edits can be sensitive to prompt phrasing and parameter tuning, requiring experimentation.
  • The unofficial nature may imply a higher bus factor and less rigorous testing compared to official implementations.
Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.