Research paper for training-free cross-domain image composition
Top 43.9% on sourcepulse
TF-ICON is a framework for seamless cross-domain image composition using diffusion models, targeting researchers and practitioners in generative AI and computer vision. It enables integrating user-provided objects into new visual contexts without requiring model fine-tuning or instance-specific optimization, leveraging pre-trained diffusion models for high-quality results.
How It Works
TF-ICON utilizes text-driven diffusion models for image composition by first inverting real images into latent representations. A key innovation is the "exceptional prompt," a specially crafted text prompt that facilitates accurate inversion of real images without prior knowledge. This allows the model to compose foreground objects into background images by manipulating these latent representations, preserving the rich priors of off-the-shelf diffusion models.
Quick Start & Requirements
conda env create -f tf_icon_env.yaml
) or a Python virtual environment (python -m venv venv
, source venv/bin/activate
, pip install -e .
).v2-1_512-ema-pruned.ckpt
) downloaded to ./ckpt
. CUDA 11.3 or compatible drivers are recommended.Highlighted Details
Maintenance & Community
The project is the official implementation for a paper presented at ICCV 2023. It builds upon Stable-Diffusion and Prompt-to-Prompt, indicating reliance on established open-source projects.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. However, its reliance on Stable Diffusion (which has its own license terms) and Prompt-to-Prompt should be considered for commercial use or closed-source integration.
Limitations & Caveats
The minimum VRAM requirement of 20 GB may be a barrier for users with less powerful hardware. The README does not detail specific performance benchmarks or potential failure cases for the "exceptional prompt" across all possible image types.
4 months ago
1 day