PyTorch code for diffusion model latent space research paper
Top 53.4% on sourcepulse
This repository provides the official PyTorch implementation for CycleDiffusion, a method for zero-shot image-to-image translation using text-to-image diffusion models. It enables users to transform images based on textual descriptions without requiring paired training data, offering a novel approach to generative image editing.
How It Works
CycleDiffusion formalizes and infers the "random seed" (latent space) of diffusion models from a given real image. By treating the input as a triplet of source image, source text, and target text, it leverages the inherent stochasticity of diffusion models to perform unpaired image-to-image translation. This approach allows for simple, zero-shot editing by manipulating the text prompts.
Quick Start & Requirements
environment.yml
. Install dependencies including CLIP and taming-transformers.wandb
for logging.Highlighted Details
Maintenance & Community
The project is associated with Carnegie Mellon University. Issues are welcome for questions. Contact Chen Henry Wu for discussions.
Licensing & Compatibility
The project uses the X11 License, which is identical to the MIT License but includes a clause prohibiting the use of copyright holders' names for advertising without permission. This license is generally permissive for commercial use and closed-source linking.
Limitations & Caveats
The setup requires downloading multiple large pre-trained model checkpoints. Some data preparation steps are necessary, and the provided example commands utilize distributed training, suggesting a need for multi-GPU setups for optimal performance.
1 year ago
1 day