Discover and explore top open-source AI tools and projects—updated daily.
Text-to-image synthesis research paper using box-constrained diffusion
Top 95.0% on SourcePulse
BoxDiff enables training-free, box-constrained text-to-image synthesis, allowing users to precisely control object placement and composition. It targets researchers and artists seeking fine-grained control over diffusion models, offering a significant advantage in generating complex scenes with specific spatial arrangements.
How It Works
BoxDiff integrates spatial constraints directly into the diffusion process without requiring model retraining. It leverages a novel conditioning mechanism that injects bounding box information into the cross-attention layers of pre-trained diffusion models like Stable Diffusion and GLIGEN. This approach allows specific text tokens to be associated with spatial regions, guiding the generation process to place corresponding objects accurately.
Quick Start & Requirements
pip3 install -r requirements.txt
.diffusers
fork is required: git clone git@github.com:gligen/diffusers.git && pip3 install -e .
run_sd_boxdiff.py
, run_gligen_boxdiff.py
) are provided.Highlighted Details
token_indices
.P
and L
for controlling constraint strength.Maintenance & Community
diffusers
, Google, and yuval-alaluf
.Licensing & Compatibility
Limitations & Caveats
The project explicitly tests only with PyTorch==1.12.0, suggesting potential compatibility issues with newer versions. The lack of a specified license and community channels may hinder adoption and long-term support.
10 months ago
Inactive