Research paper implementation for text-to-image diffusion models
Top 47.7% on sourcepulse
This repository provides the official implementation for "Attend-and-Excite," a method to improve the semantic faithfulness of text-to-image diffusion models. It addresses issues like "catastrophic neglect" where models fail to generate all specified subjects or correctly bind attributes. The target audience includes researchers and practitioners working with diffusion models who need to enhance the accuracy of generated images based on complex text prompts.
How It Works
Attend-and-Excite employs a Generative Semantic Nursing (GSN) technique, specifically an attention-based formulation. During the inference process, it intervenes by modifying the cross-attention values within the diffusion model. This refinement encourages the model to attend to all subject tokens in the text prompt and strengthens their activations, thereby ensuring all described elements are generated with greater fidelity.
Quick Start & Requirements
environment/environment.yaml
file for the base Stable Diffusion environment, then install additional requirements from environment/requirements.txt
.python run.py --prompt "a cat and a dog" --seeds [0] --token_indices [2,5]
.torch.float16
for reduced memory usage and faster inference.Highlighted Details
Maintenance & Community
The project is associated with Tel Aviv University. Citation details are provided for academic use.
Licensing & Compatibility
The repository builds upon the diffusers library and the Prompt-to-Prompt codebase. Specific licensing details beyond this are not explicitly stated in the README.
Limitations & Caveats
Using torch.float16
may lead to a slight degradation in results for some cases. The method is presented as an enhancement to existing pre-trained diffusion models like Stable Diffusion.
1 year ago
1 day