DiffuseIT  by cyclomon

Diffusion-based image translation research paper

created 2 years ago
290 stars

Top 91.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DiffuseIT provides a framework for image translation using diffusion models, disentangling style and content representations. It targets researchers and practitioners in computer vision and generative AI, enabling high-quality image transformations guided by text or other images.

How It Works

The approach leverages diffusion models, specifically building upon Blended-diffusion and guided-diffusion. It disentangles style and content through a novel representation, allowing for flexible image translation tasks. Key components include CLIP for text guidance and potentially ArcFace for identity preservation in face translation.

Quick Start & Requirements

  • Install: pip install -r requirements.txt (after setting up conda environment).
  • Prerequisites: PyTorch 1.9.0 with CUDA 11.1 (torch==1.9.0+cu111), Python 3.9, ftfy, regex, matplotlib, lpips, kornia, opencv-python, color-matcher, and CLIP from OpenAI.
  • Models: Download pre-trained diffusion models (ImageNet 256x256, FFHQ 256x256) and optionally an ArcFace model into a ./checkpoints folder.
  • Demo: Colab notebooks are available for text-guided and image-guided translation.
  • Links: Paper

Highlighted Details

  • Supports text-guided image translation with options for content regularization (--regularize_content), noise augmentation (--use_noise_aug_all), progressive contrastive loss (--use_prog_contrast), and range restart (--use_range_restart).
  • Enables image-guided translation with parameters for diffusion iterations (--diff_iter), timestep respacing (--timestep_respacing), skipping timesteps (--skip_timesteps), and color matching (--use_colormatch).
  • Can utilize a single CLIP model for memory saving (--clip_models 'ViT-B/32').
  • Built upon Blended-diffusion, guided-diffusion, flexit, and splicing vit.

Maintenance & Community

The project is associated with ICLR 2023. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project relies on other repositories (Blended-diffusion, guided-diffusion, CLIP), whose licenses would need to be considered for commercial use or closed-source linking.

Limitations & Caveats

Requires specific older versions of PyTorch (1.9.0) and CUDA (11.1), which may pose compatibility challenges with newer hardware and software stacks. Potential for face identity loss when using the FFHQ model, requiring an additional ArcFace model download.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.