Universal-Guided-Diffusion  by arpitbansal297

PyTorch code for universal diffusion guidance

created 2 years ago
491 stars

Top 63.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of Universal Guidance for Diffusion Models, enabling control over image generation using arbitrary modalities like human identity, segmentation maps, object locations, and style without retraining. It targets researchers and developers working with diffusion models who need flexible conditioning beyond text prompts.

How It Works

The core approach modifies the diffusion process to incorporate guidance signals alongside text conditioning. It leverages Stable Diffusion and OpenAI's ImageNet Diffusion Model, allowing for control via forward and backward guidance mechanisms. This method avoids the need for task-specific model retraining, offering a generalized framework for diverse conditioning inputs.

Quick Start & Requirements

  • Installation: conda env create -f environment.yaml, conda activate ldm, conda install pytorch torchvision cudatoolkit=11.3 -c pytorch, pip install GPUtil facenet-pytorch blobfile.
  • Prerequisites: PyTorch, CUDA 11.3, Stable Diffusion checkpoint (sd-v1-4.ckpt), and OpenAI's ImageNet Diffusion Model.
  • Usage: Scripts are provided for various guidance types (Face Recognition, Segmentation, Object Detection, Style Transfer, CLIP guided). Example commands demonstrate setting text prompts, guidance weights, and diffusion steps.
  • Documentation: Examples are provided within the README for each guidance type.

Highlighted Details

  • Enables control via human identity, segmentation maps, object locations, and image style.
  • Integrates with Stable Diffusion and OpenAI's ImageNet Diffusion Model.
  • Offers forward and backward guidance mechanisms for flexible control.
  • CLIP guided generation allows for out-of-distribution image synthesis.

Maintenance & Community

No specific information on maintainers, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The presence of Stable Diffusion and OpenAI models implies adherence to their respective licenses. Commercial use compatibility is not specified.

Limitations & Caveats

The repository requires specific model checkpoints (Stable Diffusion and OpenAI's ImageNet Diffusion Model) to be downloaded separately. The README does not detail performance benchmarks or specific hardware requirements beyond CUDA.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.