Universal-Guided-Diffusion  by arpitbansal297

PyTorch code for universal diffusion guidance

Created 2 years ago
497 stars

Top 62.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of Universal Guidance for Diffusion Models, enabling control over image generation using arbitrary modalities like human identity, segmentation maps, object locations, and style without retraining. It targets researchers and developers working with diffusion models who need flexible conditioning beyond text prompts.

How It Works

The core approach modifies the diffusion process to incorporate guidance signals alongside text conditioning. It leverages Stable Diffusion and OpenAI's ImageNet Diffusion Model, allowing for control via forward and backward guidance mechanisms. This method avoids the need for task-specific model retraining, offering a generalized framework for diverse conditioning inputs.

Quick Start & Requirements

  • Installation: conda env create -f environment.yaml, conda activate ldm, conda install pytorch torchvision cudatoolkit=11.3 -c pytorch, pip install GPUtil facenet-pytorch blobfile.
  • Prerequisites: PyTorch, CUDA 11.3, Stable Diffusion checkpoint (sd-v1-4.ckpt), and OpenAI's ImageNet Diffusion Model.
  • Usage: Scripts are provided for various guidance types (Face Recognition, Segmentation, Object Detection, Style Transfer, CLIP guided). Example commands demonstrate setting text prompts, guidance weights, and diffusion steps.
  • Documentation: Examples are provided within the README for each guidance type.

Highlighted Details

  • Enables control via human identity, segmentation maps, object locations, and image style.
  • Integrates with Stable Diffusion and OpenAI's ImageNet Diffusion Model.
  • Offers forward and backward guidance mechanisms for flexible control.
  • CLIP guided generation allows for out-of-distribution image synthesis.

Maintenance & Community

No specific information on maintainers, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The presence of Stable Diffusion and OpenAI models implies adherence to their respective licenses. Commercial use compatibility is not specified.

Limitations & Caveats

The repository requires specific model checkpoints (Stable Diffusion and OpenAI's ImageNet Diffusion Model) to be downloaded separately. The README does not detail performance benchmarks or specific hardware requirements beyond CUDA.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.