Research paper for text-to-image generation via depth-driven instance synthesis
Top 96.5% on sourcepulse
3DIS is a framework for text-to-image generation that leverages depth information for decoupled instance synthesis. It targets researchers and developers in generative AI, offering a novel approach to control object placement and composition in generated images. The primary benefit is enhanced control over scene layout and instance generation through depth-driven synthesis.
How It Works
3DIS employs a depth-driven approach, first generating a depth map based on textual prompts and layout specifications. This depth map then guides the synthesis of individual instances, allowing for decoupled control over object placement, scale, and orientation. The framework integrates with diffusion models like FLUX and Stable Diffusion 1.x for rendering, enabling the creation of complex scenes with multiple objects.
Quick Start & Requirements
conda create -n 3DIS python=3.10
, conda activate 3DIS
), followed by pip install -r requirement.txt
and pip install -e .
. The segment-anything-2
library also needs installation (pip install -e . --no-deps
).pretrained_weights
folder.Highlighted Details
Maintenance & Community
The project is associated with authors Dewei Zhou, Ji Xie, Zongxin Yang, and Yi Yang. Links to the project page and papers are provided.
Licensing & Compatibility
The repository does not explicitly state a license in the README.
Limitations & Caveats
The README mentions that SD1.x has limited generation capabilities and suggests using other base models for better results. The project is actively being developed with a "To Do List" indicating ongoing work.
2 months ago
1 day