cross-image-attention by garibida

Research paper implementation for zero-shot appearance transfer

Created 2 years ago

394 stars

Top 73.1% on SourcePulse

Project Summary

This repository provides the official implementation for "Cross-Image Attention for Zero-Shot Appearance Transfer," a SIGGRAPH 2024 paper. It enables users to transfer the visual appearance between objects with similar semantics but different shapes, leveraging the semantic understanding of text-to-image generative models. The primary audience is researchers and practitioners in computer vision and generative AI interested in zero-shot image manipulation.

How It Works

The core mechanism builds upon the self-attention layers of diffusion models. It introduces a cross-image attention mechanism that implicitly establishes semantic correspondences between two input images: one for structure and one for appearance. By combining queries from the structure image with keys and values from the appearance image during the denoising process, it generates an output image that merges the desired structure and appearance without requiring any training or optimization.

Quick Start & Requirements

Install: conda env create -f environment/environment.yaml followed by conda activate cross_image.
Prerequisites: Requires a Conda environment setup as specified in environment/environment.yaml.
Usage: Run via python run.py --app_image_path /path/to/appearance/image.png --struct_image_path /path/to/structure/image.png --output_path /path/to/output/images.png --domain_name [domain].
Resources: A Google Colab demo notebook and a HuggingFace demo are available.

Highlighted Details

Zero-shot appearance transfer without optimization or training.
Robust to variations in shape, size, and viewpoint.
Leverages cross-image attention for implicit semantic correspondences.
Incorporates mechanisms like masked AdaIN and FreeU for improved output quality.

Maintenance & Community

The code builds upon the HuggingFace diffusers library and borrows code from other repositories for inversion, masking, and generation quality improvements. Citation details are provided for academic use.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, its reliance on the diffusers library suggests potential compatibility with its underlying license. Users should verify licensing for commercial or closed-source applications.

Limitations & Caveats

The domain_name parameter is required when use_masked_adain is True for mask computation, indicating potential limitations in handling poorly defined domains without this parameter. The project is presented as an official implementation of a research paper, implying it may be primarily for research purposes.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days