semantic-gaussians  by sharinka0715

Open-vocabulary 3D scene understanding via Gaussian Splatting

Created 2 years ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

Semantic Gaussians, the official implementation of the paper "Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting," addresses challenges in open-vocabulary 3D scene understanding for applications like embodied agents and augmented reality. It offers a novel approach by distilling pre-trained 2D semantic features directly into 3D Gaussian Splatting representations, enabling versatile scene analysis without requiring additional NeRF training. This method provides significant improvements in semantic segmentation tasks and supports diverse downstream applications.

How It Works

The core innovation lies in distilling 2D semantic features into 3D Gaussians. A versatile projection method maps features from pre-trained image encoders into a novel semantic component of 3D Gaussians. For efficient inference, a dedicated 3D semantic network directly predicts these semantic components from raw 3D Gaussians, bypassing the need for extensive retraining. This approach leverages existing 2D vision-language models to enrich 3D scene representations.

Quick Start & Requirements

  • Install: Clone the repository (git clone --recursive), create a Conda environment (conda env create -f environment.yaml), activate it (conda activate sega), and install pip dependencies (pip install -r requirements.txt). Compilation of MinkowskiEngine is also required, with specific instructions provided for Anaconda/CUDA 11.x.
  • Prerequisites: Tested on Ubuntu 22.04 with an NVIDIA RTX 4090. Recommends Linux and an NVIDIA GPU with >= 16GB VRAM. CUDA is mandatory.
  • Links: Official repository: https://github.com/sharinka0715/semantic-gaussians.

Highlighted Details

  • Achieves state-of-the-art results on ScanNet-20, with a 9.3% mIoU and 6.5% mAcc improvement over prior open-vocabulary methods.
  • Demonstrates strong qualitative performance in object part segmentation, scene editing, and spatial-temporal segmentation.
  • Enables fast inference through a dedicated 3D semantic network.

Maintenance & Community

The project released its initial implementation in May 2024 and has since addressed dependency issues. No specific community channels (e.g., Discord, Slack), roadmap links, or notable contributor information are detailed in the README.

Licensing & Compatibility

The repository's license is not specified in the README, which is a critical omission for evaluating commercial use or integration into closed-source projects. CUDA is required, limiting compatibility to systems with NVIDIA GPUs.

Limitations & Caveats

The code has not been evaluated on Windows machines and may not be supported. macOS is explicitly not supported due to CUDA dependencies.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Jianwei Yang Jianwei Yang(Research Scientist at Meta Superintelligence Lab), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
5 more.

X-Decoder by microsoft

0%
1k
Generalized decoding model for pixel, image, and language tasks
Created 3 years ago
Updated 2 years ago
Feedback? Help us improve.