cycle-diffusion  by ChenWu98

PyTorch code for diffusion model latent space research paper

created 2 years ago
631 stars

Top 53.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation for CycleDiffusion, a method for zero-shot image-to-image translation using text-to-image diffusion models. It enables users to transform images based on textual descriptions without requiring paired training data, offering a novel approach to generative image editing.

How It Works

CycleDiffusion formalizes and infers the "random seed" (latent space) of diffusion models from a given real image. By treating the input as a triplet of source image, source text, and target text, it leverages the inherent stochasticity of diffusion models to perform unpaired image-to-image translation. This approach allows for simple, zero-shot editing by manipulating the text prompts.

Quick Start & Requirements

  • Installation: Clone the repository and set up a Conda environment using environment.yml. Install dependencies including CLIP and taming-transformers.
  • Prerequisites: PyTorch with CUDA support, Python, wandb for logging.
  • Models: Requires downloading pre-trained checkpoints for Stable Diffusion v1.4 and Latent Diffusion Models.
  • Data: Includes some evaluation data; AFHQ validation set needs to be downloaded separately.
  • Resources: Requires significant GPU resources for running diffusion models.
  • Links: Paper, Diffusers Implementation, HuggingFace Demo

Highlighted Details

  • Zero-shot image-to-image translation with text-to-image diffusion models.
  • Enables unpaired image-to-image translation with diffusion models trained on two domains.
  • Customizable use for user-defined image paths and text pairs.
  • Achieves image editing by manipulating latent space and text prompts.

Maintenance & Community

The project is associated with Carnegie Mellon University. Issues are welcome for questions. Contact Chen Henry Wu for discussions.

Licensing & Compatibility

The project uses the X11 License, which is identical to the MIT License but includes a clause prohibiting the use of copyright holders' names for advertising without permission. This license is generally permissive for commercial use and closed-source linking.

Limitations & Caveats

The setup requires downloading multiple large pre-trained model checkpoints. Some data preparation steps are necessary, and the provided example commands utilize distributed training, suggesting a need for multi-GPU setups for optimal performance.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.