DiffuseIT by cyclomon

Diffusion-based image translation research paper

Created 3 years ago

300 stars

Top 88.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

DiffuseIT provides a framework for image translation using diffusion models, disentangling style and content representations. It targets researchers and practitioners in computer vision and generative AI, enabling high-quality image transformations guided by text or other images.

How It Works

The approach leverages diffusion models, specifically building upon Blended-diffusion and guided-diffusion. It disentangles style and content through a novel representation, allowing for flexible image translation tasks. Key components include CLIP for text guidance and potentially ArcFace for identity preservation in face translation.

Quick Start & Requirements

Install: pip install -r requirements.txt (after setting up conda environment).
Prerequisites: PyTorch 1.9.0 with CUDA 11.1 (torch==1.9.0+cu111), Python 3.9, ftfy, regex, matplotlib, lpips, kornia, opencv-python, color-matcher, and CLIP from OpenAI.
Models: Download pre-trained diffusion models (ImageNet 256x256, FFHQ 256x256) and optionally an ArcFace model into a ./checkpoints folder.
Demo: Colab notebooks are available for text-guided and image-guided translation.
Links: Paper

Highlighted Details

Supports text-guided image translation with options for content regularization (--regularize_content), noise augmentation (--use_noise_aug_all), progressive contrastive loss (--use_prog_contrast), and range restart (--use_range_restart).
Enables image-guided translation with parameters for diffusion iterations (--diff_iter), timestep respacing (--timestep_respacing), skipping timesteps (--skip_timesteps), and color matching (--use_colormatch).
Can utilize a single CLIP model for memory saving (--clip_models 'ViT-B/32').
Built upon Blended-diffusion, guided-diffusion, flexit, and splicing vit.

Maintenance & Community

The project is associated with ICLR 2023. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project relies on other repositories (Blended-diffusion, guided-diffusion, CLIP), whose licenses would need to be considered for commercial use or closed-source linking.

Limitations & Caveats

Requires specific older versions of PyTorch (1.9.0) and CUDA (11.1), which may pose compatibility challenges with newer hardware and software stacks. Potential for face identity loss when using the FFHQ model, requiring an additional ArcFace model download.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days