DiffusionCLIP  by gwang-kim

Diffusion model for text-guided image manipulation

Created 3 years ago
857 stars

Top 41.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DiffusionCLIP offers a PyTorch implementation for text-guided image manipulation using diffusion models, addressing limitations of GAN-based methods in reconstructing diverse real images. It enables zero-shot manipulation across unseen domains and multi-attribute editing, targeting researchers and practitioners in computer vision and generative AI.

How It Works

DiffusionCLIP leverages the strong inversion capabilities and high-quality generation of diffusion models for image manipulation. It introduces novel sampling strategies for fine-tuning diffusion models, allowing for preserved reconstruction quality at increased speeds. The method supports in- and out-of-domain manipulation, minimizes unintended changes, and facilitates multi-attribute transfer through a noise combination technique.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies using conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION> and pip install -r requirements.txt.
  • Prerequisites: NVIDIA GPU with CUDA and CuDNN, Python 3, Anaconda.
  • VRAM: 24GB+ for original fine-tuning (256x256), 12GB+ for GPU-efficient fine-tuning (256x256), 24GB+ (512x512). Inference requires 6GB+ (256x256), 9GB+ (512x512).
  • Resources: Pretrained diffusion models for various datasets (CelebA-HQ, LSUN-Bedroom, LSUN-Church, AFHQ-Dog, ImageNet) are required. Some are auto-downloaded, others need manual placement. An IR-SE50 model is needed for identity loss.
  • Links: Colab Notebook for inference.

Highlighted Details

  • Achieves robust and superior manipulation performance compared to SOTA baselines, confirmed by experiments and human evaluation.
  • Enables accurate in- and out-of-domain manipulation and multi-attribute transfer with reduced manual intervention.
  • Demonstrates manipulation of images from the diverse ImageNet dataset.
  • Supports both original and GPU-efficient fine-tuning schemes.

Maintenance & Community

The project is associated with CVPR 2022. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code structure is based on SDEdit and StyleGAN-NADA.

Limitations & Caveats

Fine-tuning requires significant VRAM. Some pretrained models must be manually downloaded and placed in specific directories. The Colab notebook is limited to inference due to VRAM constraints.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
1 more.

cycle-diffusion by ChenWu98

0%
640
PyTorch code for diffusion model latent space research paper
Created 2 years ago
Updated 1 year ago
Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Starred by Robin Huang Robin Huang(Cofounder of Comfy Org), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
17 more.

stablediffusion by Stability-AI

0.1%
42k
Latent diffusion model for high-resolution image synthesis
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.