SCube  by nv-tlabs

Scene reconstruction research paper using voxels and splats

created 9 months ago
483 stars

Top 64.4% on sourcepulse

GitHubView on GitHub
Project Summary

SCube addresses large-scale 3D scene reconstruction, enabling instant generation of detailed and coherent scene representations. It targets researchers and engineers working with complex 3D environments, offering a novel approach to scene synthesis and reconstruction. The primary benefit is the ability to reconstruct and represent vast scenes efficiently.

How It Works

SCube employs a multi-stage generative approach. It first uses a VAE to encode scene geometry into a latent voxel representation, followed by a diffusion model for detailed geometry reconstruction. Finally, a Gaussian Splatting model (GSM) is used for appearance reconstruction. This cascaded approach allows for efficient handling of large-scale scenes by progressively refining the representation from coarse to fine details.

Quick Start & Requirements

  • Installation: Clone the repository and create a Conda environment using environment.yml.
  • Prerequisites: Python 3.x, Conda with conda-libmamba-solver, mmcv>=2.0.0, mmsegmentation>=1.0.0, and Weights & Biases (WandB) for logging. Waymo dataset (v1.4.2) is required for training and inference.
  • Data Processing: Requires significant processing time (over 1 day on 8x A100 GPUs) to convert Waymo TFRecords into WebDataset format, involving SegFormer for sky masks and Metric3Dv2 for GT depth.
  • Links: Project Page

Highlighted Details

  • Leverages a cascaded VAE-Diffusion-GSM pipeline for scene reconstruction.
  • Utilizes VoxSplats for efficient and high-fidelity scene representation.
  • Supports large-scale scene reconstruction, demonstrated with the Waymo dataset.
  • Offers inference for individual components (VAE, Diffusion, GSM) and a full pipeline.

Maintenance & Community

The project is from NVIDIA Toronto Labs, with related works including InfiniCube and XCube. No specific community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

Licensed under the Nvidia Source Code License. This license may have restrictions on commercial use and distribution.

Limitations & Caveats

The data processing pipeline is computationally intensive and time-consuming. The project relies heavily on Weights & Biases for experiment tracking, and specific versions of MMCV might cause compatibility issues. The license type should be carefully reviewed for commercial applications.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
27 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
4 more.

taming-transformers by CompVis

0.1%
6k
Image synthesis research paper using transformers
created 4 years ago
updated 1 year ago
Feedback? Help us improve.