MultiDiffusion  by omerbt

Framework for controllable image generation using pre-trained diffusion models

Created 2 years ago
1,042 stars

Top 36.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MultiDiffusion offers a unified framework for controllable image generation using pre-trained text-to-image diffusion models without requiring further training. It addresses the challenge of user control over generated images by fusing multiple diffusion processes with shared parameters, enabling adherence to spatial constraints like aspect ratios and segmentation masks. The target audience includes researchers and practitioners seeking flexible image generation capabilities.

How It Works

The core of MultiDiffusion is a novel generation process that formulates image synthesis as an optimization task. This task binds together multiple diffusion generation processes, allowing them to share parameters or adhere to common constraints. This approach enables fine-grained control over the output, such as generating images with specific aspect ratios (e.g., panoramas) or aligning generation with spatial guidance signals like segmentation masks or bounding boxes, all while leveraging existing diffusion models.

Quick Start & Requirements

  • Installation: Integrated into Hugging Face diffusers.
  • Prerequisites: PyTorch, diffusers library, CUDA-enabled GPU (for torch.float16).
  • Demo: python app_gradio.py launches a Gradio UI.
  • Resources: Requires a pre-trained diffusion model checkpoint (e.g., stabilityai/stable-diffusion-2-base).
  • Links: Project Page, HuggingFace Demo, Spatial Controls Demo

Highlighted Details

  • Enables text-to-panorama generation.
  • Supports spatial controls via segmentation masks and bounding boxes.
  • Achieves controllable image generation without model fine-tuning.
  • Based on ICML 2023 paper "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation".

Maintenance & Community

  • Official PyTorch implementation.
  • Project led by Omer Bar-Tal.
  • Paper available on arXiv.

Licensing & Compatibility

  • License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial use. While it leverages pre-trained models, the complexity of fusing multiple diffusion paths might introduce computational overhead or require careful parameter tuning for optimal results.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
3 more.

Hotshot-XL by hotshotco

0%
1k
Text-to-GIF model for Stable Diffusion XL
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
11 more.

IF by deep-floyd

0.0%
8k
Text-to-image model for photorealistic synthesis and language understanding
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.