cross-image-attention  by garibida

Research paper implementation for zero-shot appearance transfer

created 1 year ago
376 stars

Top 76.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for "Cross-Image Attention for Zero-Shot Appearance Transfer," a SIGGRAPH 2024 paper. It enables users to transfer the visual appearance between objects with similar semantics but different shapes, leveraging the semantic understanding of text-to-image generative models. The primary audience is researchers and practitioners in computer vision and generative AI interested in zero-shot image manipulation.

How It Works

The core mechanism builds upon the self-attention layers of diffusion models. It introduces a cross-image attention mechanism that implicitly establishes semantic correspondences between two input images: one for structure and one for appearance. By combining queries from the structure image with keys and values from the appearance image during the denoising process, it generates an output image that merges the desired structure and appearance without requiring any training or optimization.

Quick Start & Requirements

  • Install: conda env create -f environment/environment.yaml followed by conda activate cross_image.
  • Prerequisites: Requires a Conda environment setup as specified in environment/environment.yaml.
  • Usage: Run via python run.py --app_image_path /path/to/appearance/image.png --struct_image_path /path/to/structure/image.png --output_path /path/to/output/images.png --domain_name [domain].
  • Resources: A Google Colab demo notebook and a HuggingFace demo are available.

Highlighted Details

  • Zero-shot appearance transfer without optimization or training.
  • Robust to variations in shape, size, and viewpoint.
  • Leverages cross-image attention for implicit semantic correspondences.
  • Incorporates mechanisms like masked AdaIN and FreeU for improved output quality.

Maintenance & Community

The code builds upon the HuggingFace diffusers library and borrows code from other repositories for inversion, masking, and generation quality improvements. Citation details are provided for academic use.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, its reliance on the diffusers library suggests potential compatibility with its underlying license. Users should verify licensing for commercial or closed-source applications.

Limitations & Caveats

The domain_name parameter is required when use_masked_adain is True for mask computation, indicating potential limitations in handling poorly defined domains without this parameter. The project is presented as an official implementation of a research paper, implying it may be primarily for research purposes.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.