Diffusion model for text-guided image manipulation
Top 43.0% on sourcepulse
DiffusionCLIP offers a PyTorch implementation for text-guided image manipulation using diffusion models, addressing limitations of GAN-based methods in reconstructing diverse real images. It enables zero-shot manipulation across unseen domains and multi-attribute editing, targeting researchers and practitioners in computer vision and generative AI.
How It Works
DiffusionCLIP leverages the strong inversion capabilities and high-quality generation of diffusion models for image manipulation. It introduces novel sampling strategies for fine-tuning diffusion models, allowing for preserved reconstruction quality at increased speeds. The method supports in- and out-of-domain manipulation, minimizes unintended changes, and facilitates multi-attribute transfer through a noise combination technique.
Quick Start & Requirements
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION>
and pip install -r requirements.txt
.Highlighted Details
Maintenance & Community
The project is associated with CVPR 2022. No specific community channels or active maintenance signals are detailed in the README.
Licensing & Compatibility
The repository does not explicitly state a license. The code structure is based on SDEdit and StyleGAN-NADA.
Limitations & Caveats
Fine-tuning requires significant VRAM. Some pretrained models must be manually downloaded and placed in specific directories. The Colab notebook is limited to inference due to VRAM constraints.
2 years ago
1+ week