OpenScene provides a zero-shot framework for 3D scene understanding using open-vocabulary queries, enabling tasks like semantic segmentation, rare object search, and image-based object detection. It targets researchers and practitioners in 3D computer vision who need flexible scene analysis beyond predefined categories.
How It Works
OpenScene leverages multi-view 2D image features fused onto 3D point clouds. This approach combines the rich semantic understanding of large-scale 2D vision models (like OpenSeg or LSeg) with the geometric context of 3D data. By projecting 2D features onto 3D points, it achieves open-vocabulary capabilities, allowing queries based on arbitrary text descriptions, properties, or activities.
Quick Start & Requirements
- Installation: Follow
installation.md
.
- Prerequisites: PyTorch, CUDA (implied for feature extraction/training), specific datasets (ScanNet, Matterport3D, nuScenes, Replica).
- Data: Pre-processed datasets and multi-view fused features are available for download via
scripts/download_dataset.sh
and scripts/download_fused_features.sh
. These downloads can be substantial (e.g., 234.8G for ScanNet features).
- Demo: An interactive, real-time demo is available, requiring no GPU.
- Links: Paper, Video, Project Page.
Highlighted Details
- Supports zero-shot semantic segmentation with arbitrary text queries (e.g., "snoopy", "soft", "metal", "cooking").
- Enables open-vocabulary 3D scene exploration beyond fixed semantic labels.
- Provides pre-processed data and fused features for multiple popular 3D datasets.
- Includes code for distillation to train custom 3D models.
Maintenance & Community
- Code released March 2023, with recent updates in October 2023 for LSeg feature extraction.
- TODO list indicates ongoing development, including support for arbitrary scenes and web demos.
- Contributions are welcomed.
Licensing & Compatibility
- The README does not explicitly state a license. Code is provided for research purposes.
Limitations & Caveats
- The project relies heavily on large pre-processed datasets and fused features, requiring significant storage and download time.
- Some multi-view fused features (e.g., LSeg for Matterport/nuScenes, OpenSeg for nuScenes) are noted as "coming soon" or missing.
- The project is associated with CVPR 2023, suggesting a research focus; production readiness may vary.