TF-ICON  by Shilin-LU

Research paper for training-free cross-domain image composition

created 2 years ago
825 stars

Top 43.9% on sourcepulse

GitHubView on GitHub
Project Summary

TF-ICON is a framework for seamless cross-domain image composition using diffusion models, targeting researchers and practitioners in generative AI and computer vision. It enables integrating user-provided objects into new visual contexts without requiring model fine-tuning or instance-specific optimization, leveraging pre-trained diffusion models for high-quality results.

How It Works

TF-ICON utilizes text-driven diffusion models for image composition by first inverting real images into latent representations. A key innovation is the "exceptional prompt," a specially crafted text prompt that facilitates accurate inversion of real images without prior knowledge. This allows the model to compose foreground objects into background images by manipulating these latent representations, preserving the rich priors of off-the-shelf diffusion models.

Quick Start & Requirements

  • Install: Clone the repository and set up a Conda environment (conda env create -f tf_icon_env.yaml) or a Python virtual environment (python -m venv venv, source venv/bin/activate, pip install -e .).
  • Prerequisites: Stable Diffusion v2.1 weights (v2-1_512-ema-pruned.ckpt) downloaded to ./ckpt. CUDA 11.3 or compatible drivers are recommended.
  • Hardware: 20 GB VRAM minimum, 23 GB recommended.
  • Links: Project Page, Poster

Highlighted Details

  • Outperforms state-of-the-art inversion methods on CelebA-HQ, COCO, and ImageNet datasets.
  • Achieves superior results in diverse visual domains including sketchy painting, oil painting, photorealism, and cartoon styles.
  • Enables training-free, cross-domain image composition by leveraging pre-trained diffusion models.
  • Introduces an "exceptional prompt" for accurate real-image inversion into latent space.

Maintenance & Community

The project is the official implementation for a paper presented at ICCV 2023. It builds upon Stable-Diffusion and Prompt-to-Prompt, indicating reliance on established open-source projects.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, its reliance on Stable Diffusion (which has its own license terms) and Prompt-to-Prompt should be considered for commercial use or closed-source integration.

Limitations & Caveats

The minimum VRAM requirement of 20 GB may be a barrier for users with less powerful hardware. The README does not detail specific performance benchmarks or potential failure cases for the "exceptional prompt" across all possible image types.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
14 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.