3DIS by limuloo

Research paper for text-to-image generation via depth-driven instance synthesis

Created 1 year ago

261 stars

Top 97.5% on SourcePulse

Project Summary

3DIS is a framework for text-to-image generation that leverages depth information for decoupled instance synthesis. It targets researchers and developers in generative AI, offering a novel approach to control object placement and composition in generated images. The primary benefit is enhanced control over scene layout and instance generation through depth-driven synthesis.

How It Works

3DIS employs a depth-driven approach, first generating a depth map based on textual prompts and layout specifications. This depth map then guides the synthesis of individual instances, allowing for decoupled control over object placement, scale, and orientation. The framework integrates with diffusion models like FLUX and Stable Diffusion 1.x for rendering, enabling the creation of complex scenes with multiple objects.

Quick Start & Requirements

Installation: Requires Conda environment setup (conda create -n 3DIS python=3.10, conda activate 3DIS), followed by pip install -r requirement.txt and pip install -e .. The segment-anything-2 library also needs installation (pip install -e . --no-deps).
Prerequisites: PyTorch, CUDA (implied for deep learning models), and specific checkpoints for Text-to-Depth, Layout-to-Depth Adapter, and SAM2 are required.
Resources: Checkpoints need to be downloaded and placed in a pretrained_weights folder.
Demos: Examples for layout-to-depth generation, rendering with FLUX and SD1.x, and end-to-end generation are provided.
GUI: A GUI for depth map creation and FLUX rendering is available.
Docs: Project page and paper links are provided.

Highlighted Details

Accepted by ICLR 2025 as a spotlight paper.
Supports rendering with FLUX and Stable Diffusion 1.x, with recommendations for other base models.
Offers end-to-end layout-to-image generation capabilities.
Includes a GUI for interactive scene depth map creation and rendering.
Code released for SD1.x rendering.

Maintenance & Community

The project is associated with authors Dewei Zhou, Ji Xie, Zongxin Yang, and Yi Yang. Links to the project page and papers are provided.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

The README mentions that SD1.x has limited generation capabilities and suggests using other base models for better results. The project is actively being developed with a "To Do List" indicating ongoing work.

3DIS by limuloo

Explore Similar Projects

scene-language by zzyunzhi

LayoutGPT by weixi-feng

WonderWorld by KovenYu

LandMark by InternLandMark

WonderJourney by KovenYu

MultiDiffusion by omerbt

aphantasia by eps696

ml-hypersim by apple

stable-diffusion-webui-depthmap-script by thygate

HunyuanWorld-1.0 by Tencent-Hunyuan

LLFF by Fyusion

dream-textures by carson-katri