img2img-turbo by GaParmar

Research paper for one-step image-to-image translation using Stable Diffusion Turbo

Created 1 year ago

2,350 stars

Top 19.2% on SourcePulse

Project Summary

This repository provides a method for adapting single-step diffusion models like SD-Turbo to various image-to-image translation tasks using adversarial learning. It targets researchers and developers looking for efficient, high-quality image translation with applications in sketch-to-image, day-to-night conversion, and more, offering near real-time inference speeds.

How It Works

The approach integrates three modules from latent diffusion models into a single, end-to-end network with minimal trainable weights, primarily using LoRA adapters. This architecture allows for image translation while preserving structural information from the input. By retraining the first layer of the U-Net and incorporating skip connections and Zero-Convs, the model achieves efficient, one-step inference for both paired (pix2pix-turbo) and unpaired (CycleGAN-Turbo) translation tasks.

Quick Start & Requirements

Install: conda env create -f environment.yaml and conda activate img2img-turbo or pip install -r requirements.txt within a virtual environment.
Prerequisites: Python, Conda. Requires a GPU for inference.
Demo: Gradio demos available for gradio_sketch2image.py and gradio_canny2image.py.
Docs: Paper

Highlighted Details

Achieves 0.29s inference on A6000 and 0.11s on A100 for 512x512 images.
CycleGAN-Turbo outperforms existing GAN and diffusion methods for unpaired translation.
pix2pix-turbo matches ControlNet performance for paired tasks with one-step inference.
Supports diverse output generation by varying input noise maps and controlling style via text prompts.

Maintenance & Community

The project is associated with authors from CMU and Adobe. Links to community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The project uses Stable Diffusion-Turbo as a base model, inheriting its license. The README does not specify a unique license for the adapted code itself, but the underlying SD-Turbo model is typically released under a permissive license allowing commercial use.

Limitations & Caveats

The README focuses on successful applications and does not detail known limitations, unsupported tasks, or potential issues with the adversarial training process. The training section links to external steps rather than providing them directly.

Health Check

Last Commit

5 months ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days