3DIS  by limuloo

Research paper for text-to-image generation via depth-driven instance synthesis

created 9 months ago
268 stars

Top 96.5% on sourcepulse

GitHubView on GitHub
Project Summary

3DIS is a framework for text-to-image generation that leverages depth information for decoupled instance synthesis. It targets researchers and developers in generative AI, offering a novel approach to control object placement and composition in generated images. The primary benefit is enhanced control over scene layout and instance generation through depth-driven synthesis.

How It Works

3DIS employs a depth-driven approach, first generating a depth map based on textual prompts and layout specifications. This depth map then guides the synthesis of individual instances, allowing for decoupled control over object placement, scale, and orientation. The framework integrates with diffusion models like FLUX and Stable Diffusion 1.x for rendering, enabling the creation of complex scenes with multiple objects.

Quick Start & Requirements

  • Installation: Requires Conda environment setup (conda create -n 3DIS python=3.10, conda activate 3DIS), followed by pip install -r requirement.txt and pip install -e .. The segment-anything-2 library also needs installation (pip install -e . --no-deps).
  • Prerequisites: PyTorch, CUDA (implied for deep learning models), and specific checkpoints for Text-to-Depth, Layout-to-Depth Adapter, and SAM2 are required.
  • Resources: Checkpoints need to be downloaded and placed in a pretrained_weights folder.
  • Demos: Examples for layout-to-depth generation, rendering with FLUX and SD1.x, and end-to-end generation are provided.
  • GUI: A GUI for depth map creation and FLUX rendering is available.
  • Docs: Project page and paper links are provided.

Highlighted Details

  • Accepted by ICLR 2025 as a spotlight paper.
  • Supports rendering with FLUX and Stable Diffusion 1.x, with recommendations for other base models.
  • Offers end-to-end layout-to-image generation capabilities.
  • Includes a GUI for interactive scene depth map creation and rendering.
  • Code released for SD1.x rendering.

Maintenance & Community

The project is associated with authors Dewei Zhou, Ji Xie, Zongxin Yang, and Yi Yang. Links to the project page and papers are provided.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

The README mentions that SD1.x has limited generation capabilities and suggests using other base models for better results. The project is actively being developed with a "To Do List" indicating ongoing work.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
7 more.

stable-dreamfusion by ashawkey

0.1%
9k
Text-to-3D model using NeRF and diffusion
created 2 years ago
updated 1 year ago
Feedback? Help us improve.