DiffusionCLIP by gwang-kim

Diffusion model for text-guided image manipulation

Created 4 years ago

866 stars

Top 41.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chenlin Meng

Cofounder of Pika

Project Summary

DiffusionCLIP offers a PyTorch implementation for text-guided image manipulation using diffusion models, addressing limitations of GAN-based methods in reconstructing diverse real images. It enables zero-shot manipulation across unseen domains and multi-attribute editing, targeting researchers and practitioners in computer vision and generative AI.

How It Works

DiffusionCLIP leverages the strong inversion capabilities and high-quality generation of diffusion models for image manipulation. It introduces novel sampling strategies for fine-tuning diffusion models, allowing for preserved reconstruction quality at increased speeds. The method supports in- and out-of-domain manipulation, minimizes unintended changes, and facilitates multi-attribute transfer through a noise combination technique.

Quick Start & Requirements

Installation: Clone the repository and install dependencies using conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION> and pip install -r requirements.txt.
Prerequisites: NVIDIA GPU with CUDA and CuDNN, Python 3, Anaconda.
VRAM: 24GB+ for original fine-tuning (256x256), 12GB+ for GPU-efficient fine-tuning (256x256), 24GB+ (512x512). Inference requires 6GB+ (256x256), 9GB+ (512x512).
Resources: Pretrained diffusion models for various datasets (CelebA-HQ, LSUN-Bedroom, LSUN-Church, AFHQ-Dog, ImageNet) are required. Some are auto-downloaded, others need manual placement. An IR-SE50 model is needed for identity loss.
Links: Colab Notebook for inference.

Highlighted Details

Achieves robust and superior manipulation performance compared to SOTA baselines, confirmed by experiments and human evaluation.
Enables accurate in- and out-of-domain manipulation and multi-attribute transfer with reduced manual intervention.
Demonstrates manipulation of images from the diverse ImageNet dataset.
Supports both original and GPU-efficient fine-tuning schemes.

Maintenance & Community

The project is associated with CVPR 2022. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code structure is based on SDEdit and StyleGAN-NADA.

Limitations & Caveats

Fine-tuning requires significant VRAM. Some pretrained models must be manually downloaded and placed in specific directories. The Colab notebook is limited to inference due to VRAM constraints.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days