SpatialGen by manycore-research

3D indoor scene generation guided by layouts

Created 6 months ago

355 stars

Top 78.9% on SourcePulse

Project Summary

SpatialGen addresses the challenge of generating realistic 3D indoor scenes by leveraging layout-guided conditioning. It targets researchers and developers in computer vision and AI, offering a novel approach to synthesize complex 3D environments from either reference images or textual descriptions, thereby accelerating 3D content creation and scene understanding research.

How It Works

SpatialGen employs a multi-view, multi-modal diffusion model to generate 3D indoor scenes. The core innovation lies in its ability to condition scene generation on a provided 3D semantic layout, which can then be further guided by either a reference image (image-to-scene) or a textual description (text-to-scene). This multi-modal conditioning allows for flexible and precise control over the generated scene's appearance and content.

Quick Start & Requirements

Installation: Clone the repository, create a Python 3.10 virtual environment, and install dependencies via pip install -r requirements.txt. An optional fix for CUDA 12.1 is available: pip install nvidia-cublas-cu12==12.4.5.8.
Prerequisites: Python 3.10, PyTorch 2.3.1, CUDA 12.1.
Models: Pre-trained models are available on HuggingFace: SpatialGen-1.0 and FLUX.1-Wireframe-dev-lora.
Dataset: A testset with 48 rooms, including 3D layouts and rendered images, is provided for inference.
Inference: Scripts are available for both image-to-scene (scripts/infer_spatialgen_i2s.sh) and text-to-scene (scripts/infer_spatialgen_t2s.sh) generation.

Highlighted Details

Supports both image-to-3D scene generation and text-to-3D scene generation, conditioned on 3D semantic layouts.
Provides inference scripts and a test dataset for immediate use.
Leverages established diffusion models (Stable-Diffusion-v2.1, FLUX) for its generative capabilities.

Maintenance & Community

The project is associated with the HKUST Spatial Artificial Intelligence Lab and Manycore Tech Inc. Specific community links (e.g., Discord, Slack) or detailed roadmap information beyond the initial release plan are not provided in the README.

Licensing & Compatibility

SpatialGen-1.0 is released under the CreativeML Open RAIL++-M License, derived from Stable-Diffusion-v2.1. However, the accompanying FLUX.1-Wireframe-dev-lora model is under the FLUX.1-dev Non-Commercial License, which may restrict commercial use.

Limitations & Caveats

The current release focuses on inference; training instructions and the full SpatialGen dataset are planned for future release. The non-commercial license of the FLUX model component imposes restrictions on its use in commercial applications.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days