cycle-diffusion by ChenWu98

PyTorch code for diffusion model latent space research paper

Created 3 years ago

649 stars

Top 51.5% on SourcePulse

View on GitHub

3 Experts Love This Project

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Edward Sun

Research Scientist at Meta Superintelligence Lab

Binyuan Hui

Research Scientist at Alibaba Qwen

Project Summary

This repository provides the official PyTorch implementation for CycleDiffusion, a method for zero-shot image-to-image translation using text-to-image diffusion models. It enables users to transform images based on textual descriptions without requiring paired training data, offering a novel approach to generative image editing.

How It Works

CycleDiffusion formalizes and infers the "random seed" (latent space) of diffusion models from a given real image. By treating the input as a triplet of source image, source text, and target text, it leverages the inherent stochasticity of diffusion models to perform unpaired image-to-image translation. This approach allows for simple, zero-shot editing by manipulating the text prompts.

Quick Start & Requirements

Installation: Clone the repository and set up a Conda environment using environment.yml. Install dependencies including CLIP and taming-transformers.
Prerequisites: PyTorch with CUDA support, Python, wandb for logging.
Models: Requires downloading pre-trained checkpoints for Stable Diffusion v1.4 and Latent Diffusion Models.
Data: Includes some evaluation data; AFHQ validation set needs to be downloaded separately.
Resources: Requires significant GPU resources for running diffusion models.
Links: Paper, Diffusers Implementation, HuggingFace Demo

Highlighted Details

Zero-shot image-to-image translation with text-to-image diffusion models.
Enables unpaired image-to-image translation with diffusion models trained on two domains.
Customizable use for user-defined image paths and text pairs.
Achieves image editing by manipulating latent space and text prompts.

Maintenance & Community

The project is associated with Carnegie Mellon University. Issues are welcome for questions. Contact Chen Henry Wu for discussions.

Licensing & Compatibility

The project uses the X11 License, which is identical to the MIT License but includes a clause prohibiting the use of copyright holders' names for advertising without permission. This license is generally permissive for commercial use and closed-source linking.

Limitations & Caveats

The setup requires downloading multiple large pre-trained model checkpoints. Some data preparation steps are necessary, and the provided example commands utilize distributed training, suggesting a need for multi-GPU setups for optimal performance.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days