mixture-of-diffusers by albarji

Image generation method for scene composition using multiple diffusion processes

Created 3 years ago

447 stars

Top 67.2% on SourcePulse

View on GitHub

2 Experts Love This Project

Omar Sanseviero

DevRel at Google DeepMind

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

This repository provides tools for advanced image composition and high-resolution generation using multiple diffusion models, targeting users who need precise control over image layout beyond standard text-to-image models. It enables complex scene creation by dividing the generation canvas into regions, each handled by a specialized diffusion process with its own prompt and settings, ensuring seamless blending between regions.

How It Works

The core innovation lies in extending the Hugging Face Diffusers library with StableDiffusionTilingPipeline and StableDiffusionCanvasPipeline. These pipelines allow users to define a grid or arbitrary regions across the image canvas. Each region is processed by a separate diffusion model instance, configured with specific prompts, guidance scales, and seeds. Overlap parameters manage smooth transitions between adjacent regions, preventing visual artifacts. The StableDiffusionCanvasPipeline further supports image-to-image guidance within specific regions, allowing for more complex compositions and style transfers.

Quick Start & Requirements

Installation: Requires Python and the diffusers library.
Prerequisites: Access to Stable Diffusion models (requires Hugging Face user access token), PyTorch, and a CUDA-enabled GPU.
Usage: Python code examples are provided for both StableDiffusionTilingPipeline and StableDiffusionCanvasPipeline, demonstrating how to define prompts, regions, and generation parameters.
Documentation: Full technical details are available in the linked paper: https://arxiv.org/abs/2302.02412.

Highlighted Details

Enables complex scene composition by assigning specific prompts and settings to distinct image regions.
Supports both grid-based tiling (StableDiffusionTilingPipeline) and flexible, arbitrary region placement (StableDiffusionCanvasPipeline).
StableDiffusionCanvasPipeline includes image-to-image capabilities for regional guidance.
Offers options for CPU VAE offloading (cpu_vae=True) to manage GPU memory for large images.

Maintenance & Community

The project is maintained by albarji. Acknowledgements mention the Stable Diffusion team, Hugging Face, and specific research institutions for GPU resources. Community interaction points are not explicitly listed in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it relies on Stable Diffusion models, which have their own usage terms and require Hugging Face authentication, implying potential restrictions on commercial use depending on the underlying model licenses.

Limitations & Caveats

The project requires significant GPU resources and familiarity with the Diffusers library. While it aims to reduce memory usage for large images via CPU VAE offloading, generating very high-resolution images or using many complex regions may still be memory-intensive. The README does not detail specific performance benchmarks or known bugs.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days