BoxDiff  by showlab

Text-to-image synthesis research paper using box-constrained diffusion

Created 2 years ago
271 stars

Top 95.0% on SourcePulse

GitHubView on GitHub
Project Summary

BoxDiff enables training-free, box-constrained text-to-image synthesis, allowing users to precisely control object placement and composition. It targets researchers and artists seeking fine-grained control over diffusion models, offering a significant advantage in generating complex scenes with specific spatial arrangements.

How It Works

BoxDiff integrates spatial constraints directly into the diffusion process without requiring model retraining. It leverages a novel conditioning mechanism that injects bounding box information into the cross-attention layers of pre-trained diffusion models like Stable Diffusion and GLIGEN. This approach allows specific text tokens to be associated with spatial regions, guiding the generation process to place corresponding objects accurately.

Quick Start & Requirements

  • Install via pip3 install -r requirements.txt.
  • Requires PyTorch==1.12.0.
  • CUDA is necessary for GPU acceleration.
  • For GLIGEN integration, a specific diffusers fork is required: git clone git@github.com:gligen/diffusers.git && pip3 install -e .
  • Official examples and usage scripts (run_sd_boxdiff.py, run_gligen_boxdiff.py) are provided.

Highlighted Details

  • Implements box-constrained diffusion for Stable Diffusion and GLIGEN pipelines.
  • Supports specifying bounding boxes via coordinates or an interactive drawing interface.
  • Allows mapping specific text tokens to bounding boxes using token_indices.
  • Offers hyper-parameters P and L for controlling constraint strength.

Maintenance & Community

  • Based on work from diffusers, Google, and yuval-alaluf.
  • The project is associated with ICCV 2023.
  • No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license.
  • Compatibility with commercial or closed-source applications is not specified.

Limitations & Caveats

The project explicitly tests only with PyTorch==1.12.0, suggesting potential compatibility issues with newer versions. The lack of a specified license and community channels may hinder adoption and long-term support.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Starred by Robin Huang Robin Huang(Cofounder of Comfy Org), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
17 more.

stablediffusion by Stability-AI

0.1%
42k
Latent diffusion model for high-resolution image synthesis
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.