mixture-of-diffusers  by albarji

Image generation method for scene composition using multiple diffusion processes

Created 3 years ago
445 stars

Top 67.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides tools for advanced image composition and high-resolution generation using multiple diffusion models, targeting users who need precise control over image layout beyond standard text-to-image models. It enables complex scene creation by dividing the generation canvas into regions, each handled by a specialized diffusion process with its own prompt and settings, ensuring seamless blending between regions.

How It Works

The core innovation lies in extending the Hugging Face Diffusers library with StableDiffusionTilingPipeline and StableDiffusionCanvasPipeline. These pipelines allow users to define a grid or arbitrary regions across the image canvas. Each region is processed by a separate diffusion model instance, configured with specific prompts, guidance scales, and seeds. Overlap parameters manage smooth transitions between adjacent regions, preventing visual artifacts. The StableDiffusionCanvasPipeline further supports image-to-image guidance within specific regions, allowing for more complex compositions and style transfers.

Quick Start & Requirements

  • Installation: Requires Python and the diffusers library.
  • Prerequisites: Access to Stable Diffusion models (requires Hugging Face user access token), PyTorch, and a CUDA-enabled GPU.
  • Usage: Python code examples are provided for both StableDiffusionTilingPipeline and StableDiffusionCanvasPipeline, demonstrating how to define prompts, regions, and generation parameters.
  • Documentation: Full technical details are available in the linked paper: https://arxiv.org/abs/2302.02412.

Highlighted Details

  • Enables complex scene composition by assigning specific prompts and settings to distinct image regions.
  • Supports both grid-based tiling (StableDiffusionTilingPipeline) and flexible, arbitrary region placement (StableDiffusionCanvasPipeline).
  • StableDiffusionCanvasPipeline includes image-to-image capabilities for regional guidance.
  • Offers options for CPU VAE offloading (cpu_vae=True) to manage GPU memory for large images.

Maintenance & Community

The project is maintained by albarji. Acknowledgements mention the Stable Diffusion team, Hugging Face, and specific research institutions for GPU resources. Community interaction points are not explicitly listed in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it relies on Stable Diffusion models, which have their own usage terms and require Hugging Face authentication, implying potential restrictions on commercial use depending on the underlying model licenses.

Limitations & Caveats

The project requires significant GPU resources and familiarity with the Diffusers library. While it aims to reduce memory usage for large images via CPU VAE offloading, generating very high-resolution images or using many complex regions may still be memory-intensive. The README does not detail specific performance benchmarks or known bugs.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Starred by Robin Huang Robin Huang(Cofounder of Comfy Org), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
17 more.

stablediffusion by Stability-AI

0.1%
42k
Latent diffusion model for high-resolution image synthesis
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.