TF-ICON by Shilin-LU

Research paper for training-free cross-domain image composition

Created 2 years ago

822 stars

Top 43.2% on SourcePulse

Project Summary

TF-ICON is a framework for seamless cross-domain image composition using diffusion models, targeting researchers and practitioners in generative AI and computer vision. It enables integrating user-provided objects into new visual contexts without requiring model fine-tuning or instance-specific optimization, leveraging pre-trained diffusion models for high-quality results.

How It Works

TF-ICON utilizes text-driven diffusion models for image composition by first inverting real images into latent representations. A key innovation is the "exceptional prompt," a specially crafted text prompt that facilitates accurate inversion of real images without prior knowledge. This allows the model to compose foreground objects into background images by manipulating these latent representations, preserving the rich priors of off-the-shelf diffusion models.

Quick Start & Requirements

Install: Clone the repository and set up a Conda environment (conda env create -f tf_icon_env.yaml) or a Python virtual environment (python -m venv venv, source venv/bin/activate, pip install -e .).
Prerequisites: Stable Diffusion v2.1 weights (v2-1_512-ema-pruned.ckpt) downloaded to ./ckpt. CUDA 11.3 or compatible drivers are recommended.
Hardware: 20 GB VRAM minimum, 23 GB recommended.
Links: Project Page, Poster

Highlighted Details

Outperforms state-of-the-art inversion methods on CelebA-HQ, COCO, and ImageNet datasets.
Achieves superior results in diverse visual domains including sketchy painting, oil painting, photorealism, and cartoon styles.
Enables training-free, cross-domain image composition by leveraging pre-trained diffusion models.
Introduces an "exceptional prompt" for accurate real-image inversion into latent space.

Maintenance & Community

The project is the official implementation for a paper presented at ICCV 2023. It builds upon Stable-Diffusion and Prompt-to-Prompt, indicating reliance on established open-source projects.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, its reliance on Stable Diffusion (which has its own license terms) and Prompt-to-Prompt should be considered for commercial use or closed-source integration.

Limitations & Caveats

The minimum VRAM requirement of 20 GB may be a barrier for users with less powerful hardware. The README does not detail specific performance benchmarks or potential failure cases for the "exceptional prompt" across all possible image types.

TF-ICON by Shilin-LU

Explore Similar Projects

DiffuseIT by cyclomon

Awesome-Diffusion-for-Image-Translation by wd1511

cross-image-attention by garibida

Kandinsky-3 by ai-forever

glid-3-xl by Jack000

clip-guided-diffusion by afiaka87

RPG-DiffusionMaster by YangLing0818

text2image-gui by n00mkrad

DiffusionCLIP by gwang-kim

Kandinsky-2 by ai-forever

DALLE2-pytorch by lucidrains

latent-diffusion by CompVis