DiffuseIT  by cyclomon

Diffusion-based image translation research paper

Created 3 years ago
294 stars

Top 89.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DiffuseIT provides a framework for image translation using diffusion models, disentangling style and content representations. It targets researchers and practitioners in computer vision and generative AI, enabling high-quality image transformations guided by text or other images.

How It Works

The approach leverages diffusion models, specifically building upon Blended-diffusion and guided-diffusion. It disentangles style and content through a novel representation, allowing for flexible image translation tasks. Key components include CLIP for text guidance and potentially ArcFace for identity preservation in face translation.

Quick Start & Requirements

  • Install: pip install -r requirements.txt (after setting up conda environment).
  • Prerequisites: PyTorch 1.9.0 with CUDA 11.1 (torch==1.9.0+cu111), Python 3.9, ftfy, regex, matplotlib, lpips, kornia, opencv-python, color-matcher, and CLIP from OpenAI.
  • Models: Download pre-trained diffusion models (ImageNet 256x256, FFHQ 256x256) and optionally an ArcFace model into a ./checkpoints folder.
  • Demo: Colab notebooks are available for text-guided and image-guided translation.
  • Links: Paper

Highlighted Details

  • Supports text-guided image translation with options for content regularization (--regularize_content), noise augmentation (--use_noise_aug_all), progressive contrastive loss (--use_prog_contrast), and range restart (--use_range_restart).
  • Enables image-guided translation with parameters for diffusion iterations (--diff_iter), timestep respacing (--timestep_respacing), skipping timesteps (--skip_timesteps), and color matching (--use_colormatch).
  • Can utilize a single CLIP model for memory saving (--clip_models 'ViT-B/32').
  • Built upon Blended-diffusion, guided-diffusion, flexit, and splicing vit.

Maintenance & Community

The project is associated with ICLR 2023. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project relies on other repositories (Blended-diffusion, guided-diffusion, CLIP), whose licenses would need to be considered for commercial use or closed-source linking.

Limitations & Caveats

Requires specific older versions of PyTorch (1.9.0) and CUDA (11.1), which may pose compatibility challenges with newer hardware and software stacks. Potential for face identity loss when using the FFHQ model, requiring an additional ArcFace model download.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
1 more.

cycle-diffusion by ChenWu98

0%
640
PyTorch code for diffusion model latent space research paper
Created 2 years ago
Updated 1 year ago
Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.