MultiDiffusion  by omerbt

Framework for controllable image generation using pre-trained diffusion models

created 2 years ago
1,043 stars

Top 36.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MultiDiffusion offers a unified framework for controllable image generation using pre-trained text-to-image diffusion models without requiring further training. It addresses the challenge of user control over generated images by fusing multiple diffusion processes with shared parameters, enabling adherence to spatial constraints like aspect ratios and segmentation masks. The target audience includes researchers and practitioners seeking flexible image generation capabilities.

How It Works

The core of MultiDiffusion is a novel generation process that formulates image synthesis as an optimization task. This task binds together multiple diffusion generation processes, allowing them to share parameters or adhere to common constraints. This approach enables fine-grained control over the output, such as generating images with specific aspect ratios (e.g., panoramas) or aligning generation with spatial guidance signals like segmentation masks or bounding boxes, all while leveraging existing diffusion models.

Quick Start & Requirements

  • Installation: Integrated into Hugging Face diffusers.
  • Prerequisites: PyTorch, diffusers library, CUDA-enabled GPU (for torch.float16).
  • Demo: python app_gradio.py launches a Gradio UI.
  • Resources: Requires a pre-trained diffusion model checkpoint (e.g., stabilityai/stable-diffusion-2-base).
  • Links: Project Page, HuggingFace Demo, Spatial Controls Demo

Highlighted Details

  • Enables text-to-panorama generation.
  • Supports spatial controls via segmentation masks and bounding boxes.
  • Achieves controllable image generation without model fine-tuning.
  • Based on ICML 2023 paper "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation".

Maintenance & Community

  • Official PyTorch implementation.
  • Project led by Omer Bar-Tal.
  • Paper available on arXiv.

Licensing & Compatibility

  • License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial use. While it leverages pre-trained models, the complexity of fusing multiple diffusion paths might introduce computational overhead or require careful parameter tuning for optimal results.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.