cycle-diffusion  by ChenWu98

PyTorch code for diffusion model latent space research paper

Created 2 years ago
640 stars

Top 51.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation for CycleDiffusion, a method for zero-shot image-to-image translation using text-to-image diffusion models. It enables users to transform images based on textual descriptions without requiring paired training data, offering a novel approach to generative image editing.

How It Works

CycleDiffusion formalizes and infers the "random seed" (latent space) of diffusion models from a given real image. By treating the input as a triplet of source image, source text, and target text, it leverages the inherent stochasticity of diffusion models to perform unpaired image-to-image translation. This approach allows for simple, zero-shot editing by manipulating the text prompts.

Quick Start & Requirements

  • Installation: Clone the repository and set up a Conda environment using environment.yml. Install dependencies including CLIP and taming-transformers.
  • Prerequisites: PyTorch with CUDA support, Python, wandb for logging.
  • Models: Requires downloading pre-trained checkpoints for Stable Diffusion v1.4 and Latent Diffusion Models.
  • Data: Includes some evaluation data; AFHQ validation set needs to be downloaded separately.
  • Resources: Requires significant GPU resources for running diffusion models.
  • Links: Paper, Diffusers Implementation, HuggingFace Demo

Highlighted Details

  • Zero-shot image-to-image translation with text-to-image diffusion models.
  • Enables unpaired image-to-image translation with diffusion models trained on two domains.
  • Customizable use for user-defined image paths and text pairs.
  • Achieves image editing by manipulating latent space and text prompts.

Maintenance & Community

The project is associated with Carnegie Mellon University. Issues are welcome for questions. Contact Chen Henry Wu for discussions.

Licensing & Compatibility

The project uses the X11 License, which is identical to the MIT License but includes a clause prohibiting the use of copyright holders' names for advertising without permission. This license is generally permissive for commercial use and closed-source linking.

Limitations & Caveats

The setup requires downloading multiple large pre-trained model checkpoints. Some data preparation steps are necessary, and the provided example commands utilize distributed training, suggesting a need for multi-GPU setups for optimal performance.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.