MultiDiffusion by omerbt

Framework for controllable image generation using pre-trained diffusion models

Created 2 years ago

1,052 stars

Top 35.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chenlin Meng

Cofounder of Pika

Project Summary

MultiDiffusion offers a unified framework for controllable image generation using pre-trained text-to-image diffusion models without requiring further training. It addresses the challenge of user control over generated images by fusing multiple diffusion processes with shared parameters, enabling adherence to spatial constraints like aspect ratios and segmentation masks. The target audience includes researchers and practitioners seeking flexible image generation capabilities.

How It Works

The core of MultiDiffusion is a novel generation process that formulates image synthesis as an optimization task. This task binds together multiple diffusion generation processes, allowing them to share parameters or adhere to common constraints. This approach enables fine-grained control over the output, such as generating images with specific aspect ratios (e.g., panoramas) or aligning generation with spatial guidance signals like segmentation masks or bounding boxes, all while leveraging existing diffusion models.

Quick Start & Requirements

Installation: Integrated into Hugging Face diffusers.
Prerequisites: PyTorch, diffusers library, CUDA-enabled GPU (for torch.float16).
Demo: python app_gradio.py launches a Gradio UI.
Resources: Requires a pre-trained diffusion model checkpoint (e.g., stabilityai/stable-diffusion-2-base).
Links: Project Page, HuggingFace Demo, Spatial Controls Demo

Highlighted Details

Enables text-to-panorama generation.
Supports spatial controls via segmentation masks and bounding boxes.
Achieves controllable image generation without model fine-tuning.
Based on ICML 2023 paper "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation".

Maintenance & Community

Official PyTorch implementation.
Project led by Omer Bar-Tal.
Paper available on arXiv.

Licensing & Compatibility

License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial use. While it leverages pre-trained models, the complexity of fusing multiple diffusion paths might introduce computational overhead or require careful parameter tuning for optimal results.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days