Attend-and-Excite by yuval-alaluf

Research paper implementation for text-to-image diffusion models

Created 2 years ago

760 stars

Top 45.9% on SourcePulse

Project Summary

This repository provides the official implementation for "Attend-and-Excite," a method to improve the semantic faithfulness of text-to-image diffusion models. It addresses issues like "catastrophic neglect" where models fail to generate all specified subjects or correctly bind attributes. The target audience includes researchers and practitioners working with diffusion models who need to enhance the accuracy of generated images based on complex text prompts.

How It Works

Attend-and-Excite employs a Generative Semantic Nursing (GSN) technique, specifically an attention-based formulation. During the inference process, it intervenes by modifying the cross-attention values within the diffusion model. This refinement encourages the model to attend to all subject tokens in the text prompt and strengthens their activations, thereby ensuring all described elements are generated with greater fidelity.

Quick Start & Requirements

Install: Follow the setup instructions in the environment/environment.yaml file for the base Stable Diffusion environment, then install additional requirements from environment/requirements.txt.
Prerequisites: Requires the official Stable Diffusion repository setup and Hugging Face's Diffusers library.
Usage: Run python run.py --prompt "a cat and a dog" --seeds [0] --token_indices [2,5].
Resources: Supports torch.float16 for reduced memory usage and faster inference.
Docs: Notebooks for generation and explainability are provided.

Highlighted Details

Enhances semantic faithfulness in text-to-image generation by addressing catastrophic neglect and attribute binding.
Utilizes an attention-based mechanism to guide cross-attention units during synthesis.
Includes code for reproducing quantitative experiments, such as CLIP similarity metrics.
Offers notebooks for generating images and visualizing cross-attention maps for explainability.

Maintenance & Community

The project is associated with Tel Aviv University. Citation details are provided for academic use.

Licensing & Compatibility

The repository builds upon the diffusers library and the Prompt-to-Prompt codebase. Specific licensing details beyond this are not explicitly stated in the README.

Limitations & Caveats

Using torch.float16 may lead to a slight degradation in results for some cases. The method is presented as an enhancement to existing pre-trained diffusion models like Stable Diffusion.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days