dift by Tsingularity

Research paper code for emergent semantic correspondence via image diffusion

Created 2 years ago

748 stars

Top 46.5% on SourcePulse

Project Summary

This repository provides Diffusion Features (DIFT), a method for extracting dense semantic correspondences from images using diffusion models. It is designed for researchers and practitioners in computer vision and machine learning who need robust feature extraction for tasks like image editing, segmentation, and object matching. DIFT leverages emergent properties of diffusion models to establish correspondences, offering a novel approach to feature representation.

How It Works

DIFT extracts features by querying intermediate layers of pre-trained diffusion models (Stable Diffusion and ADM) at specific timesteps and U-Net layers. This approach capitalizes on the diffusion process's ability to capture rich semantic information at various scales. By analyzing the feature maps, DIFT identifies corresponding points across images, even for objects with significant appearance or viewpoint changes.

Quick Start & Requirements

Install: Use conda env create -f environment.yml and conda activate dift.
Prerequisites: Linux machine, Python environment (via environment.yml or setup_env.sh), PyTorch.
Demo: An interactive Jupyter notebook (demo.ipynb) and a Colab demo are available for trying out DIFT.
Feature Extraction: Run python extract_dift.py with specified input/output paths, image size, timestep (t), U-Net layer index (up_ft_index), prompt, and ensemble size.

Highlighted Details

Achieves state-of-the-art results on SPair-71k for semantic correspondence.
Demonstrates strong performance on HPatches for homography estimation.
Enables edit propagation across images, even across different categories.
Evaluated on DAVIS 2017 for segmentation tasks.

Maintenance & Community

The project is associated with the NeurIPS 2023 paper "Emergent Correspondence from Image Diffusion" by Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan.

Licensing & Compatibility

The repository does not explicitly state a license. The code relies on external diffusion models (Stable Diffusion, ADM) which have their own licenses. Compatibility for commercial use or closed-source linking would require verification of these underlying model licenses.

Limitations & Caveats

The README mentions that ensemble_size and img_size can be reduced if memory issues are encountered, suggesting potential high resource requirements. Specific parameter choices (t, up_ft_index) are crucial for optimal performance on different tasks and models.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days