DiffusionCLIP  by gwang-kim

Diffusion model for text-guided image manipulation

created 3 years ago
848 stars

Top 43.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DiffusionCLIP offers a PyTorch implementation for text-guided image manipulation using diffusion models, addressing limitations of GAN-based methods in reconstructing diverse real images. It enables zero-shot manipulation across unseen domains and multi-attribute editing, targeting researchers and practitioners in computer vision and generative AI.

How It Works

DiffusionCLIP leverages the strong inversion capabilities and high-quality generation of diffusion models for image manipulation. It introduces novel sampling strategies for fine-tuning diffusion models, allowing for preserved reconstruction quality at increased speeds. The method supports in- and out-of-domain manipulation, minimizes unintended changes, and facilitates multi-attribute transfer through a noise combination technique.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies using conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION> and pip install -r requirements.txt.
  • Prerequisites: NVIDIA GPU with CUDA and CuDNN, Python 3, Anaconda.
  • VRAM: 24GB+ for original fine-tuning (256x256), 12GB+ for GPU-efficient fine-tuning (256x256), 24GB+ (512x512). Inference requires 6GB+ (256x256), 9GB+ (512x512).
  • Resources: Pretrained diffusion models for various datasets (CelebA-HQ, LSUN-Bedroom, LSUN-Church, AFHQ-Dog, ImageNet) are required. Some are auto-downloaded, others need manual placement. An IR-SE50 model is needed for identity loss.
  • Links: Colab Notebook for inference.

Highlighted Details

  • Achieves robust and superior manipulation performance compared to SOTA baselines, confirmed by experiments and human evaluation.
  • Enables accurate in- and out-of-domain manipulation and multi-attribute transfer with reduced manual intervention.
  • Demonstrates manipulation of images from the diverse ImageNet dataset.
  • Supports both original and GPU-efficient fine-tuning schemes.

Maintenance & Community

The project is associated with CVPR 2022. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code structure is based on SDEdit and StyleGAN-NADA.

Limitations & Caveats

Fine-tuning requires significant VRAM. Some pretrained models must be manually downloaded and placed in specific directories. The Colab notebook is limited to inference due to VRAM constraints.

Health Check
Last commit

2 years ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Travis Fischer Travis Fischer(Founder of Agentic), and
3 more.

consistency_models by openai

0.0%
6k
PyTorch code for consistency models research paper
created 2 years ago
updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
4 more.

taming-transformers by CompVis

0.1%
6k
Image synthesis research paper using transformers
created 4 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Feedback? Help us improve.