Diffusion-based image translation research paper
Top 91.7% on sourcepulse
DiffuseIT provides a framework for image translation using diffusion models, disentangling style and content representations. It targets researchers and practitioners in computer vision and generative AI, enabling high-quality image transformations guided by text or other images.
How It Works
The approach leverages diffusion models, specifically building upon Blended-diffusion and guided-diffusion. It disentangles style and content through a novel representation, allowing for flexible image translation tasks. Key components include CLIP for text guidance and potentially ArcFace for identity preservation in face translation.
Quick Start & Requirements
pip install -r requirements.txt
(after setting up conda environment).torch==1.9.0+cu111
), Python 3.9, ftfy
, regex
, matplotlib
, lpips
, kornia
, opencv-python
, color-matcher
, and CLIP from OpenAI../checkpoints
folder.Highlighted Details
--regularize_content
), noise augmentation (--use_noise_aug_all
), progressive contrastive loss (--use_prog_contrast
), and range restart (--use_range_restart
).--diff_iter
), timestep respacing (--timestep_respacing
), skipping timesteps (--skip_timesteps
), and color matching (--use_colormatch
).--clip_models 'ViT-B/32'
).Maintenance & Community
The project is associated with ICLR 2023. No specific community channels or active maintenance signals are detailed in the README.
Licensing & Compatibility
The README does not explicitly state a license. The project relies on other repositories (Blended-diffusion, guided-diffusion, CLIP), whose licenses would need to be considered for commercial use or closed-source linking.
Limitations & Caveats
Requires specific older versions of PyTorch (1.9.0) and CUDA (11.1), which may pose compatibility challenges with newer hardware and software stacks. Potential for face identity loss when using the FFHQ model, requiring an additional ArcFace model download.
2 years ago
Inactive