BoxDiff by showlab

Text-to-image synthesis research paper using box-constrained diffusion

Created 2 years ago

274 stars

Top 94.4% on SourcePulse

Project Summary

BoxDiff enables training-free, box-constrained text-to-image synthesis, allowing users to precisely control object placement and composition. It targets researchers and artists seeking fine-grained control over diffusion models, offering a significant advantage in generating complex scenes with specific spatial arrangements.

How It Works

BoxDiff integrates spatial constraints directly into the diffusion process without requiring model retraining. It leverages a novel conditioning mechanism that injects bounding box information into the cross-attention layers of pre-trained diffusion models like Stable Diffusion and GLIGEN. This approach allows specific text tokens to be associated with spatial regions, guiding the generation process to place corresponding objects accurately.

Quick Start & Requirements

Install via pip3 install -r requirements.txt.
Requires PyTorch==1.12.0.
CUDA is necessary for GPU acceleration.
For GLIGEN integration, a specific diffusers fork is required: git clone git@github.com:gligen/diffusers.git && pip3 install -e .
Official examples and usage scripts (run_sd_boxdiff.py, run_gligen_boxdiff.py) are provided.

Highlighted Details

Implements box-constrained diffusion for Stable Diffusion and GLIGEN pipelines.
Supports specifying bounding boxes via coordinates or an interactive drawing interface.
Allows mapping specific text tokens to bounding boxes using token_indices.
Offers hyper-parameters P and L for controlling constraint strength.

Maintenance & Community

Based on work from diffusers, Google, and yuval-alaluf.
The project is associated with ICCV 2023.
No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license.
Compatibility with commercial or closed-source applications is not specified.

Limitations & Caveats

The project explicitly tests only with PyTorch==1.12.0, suggesting potential compatibility issues with newer versions. The lack of a specified license and community channels may hinder adoption and long-term support.

BoxDiff by showlab

Explore Similar Projects

pose-depot by a-lgil

SemanticStyleGAN by seasonSH

glid-3-xl-stable by Jack000

mixture-of-diffusers by albarji

semantic-draw by ironjr

MultiDiffusion by omerbt

stable-diffusion-2-gui by qunash

TF-ICON by Shilin-LU

RPG-DiffusionMaster by YangLing0818

Kandinsky-2 by ai-forever

latent-diffusion by CompVis

stablediffusion by Stability-AI