openscene  by pengsongyou

3D scene understanding research paper using open-vocabulary queries

created 2 years ago
744 stars

Top 47.6% on sourcepulse

GitHubView on GitHub
Project Summary

OpenScene provides a zero-shot framework for 3D scene understanding using open-vocabulary queries, enabling tasks like semantic segmentation, rare object search, and image-based object detection. It targets researchers and practitioners in 3D computer vision who need flexible scene analysis beyond predefined categories.

How It Works

OpenScene leverages multi-view 2D image features fused onto 3D point clouds. This approach combines the rich semantic understanding of large-scale 2D vision models (like OpenSeg or LSeg) with the geometric context of 3D data. By projecting 2D features onto 3D points, it achieves open-vocabulary capabilities, allowing queries based on arbitrary text descriptions, properties, or activities.

Quick Start & Requirements

  • Installation: Follow installation.md.
  • Prerequisites: PyTorch, CUDA (implied for feature extraction/training), specific datasets (ScanNet, Matterport3D, nuScenes, Replica).
  • Data: Pre-processed datasets and multi-view fused features are available for download via scripts/download_dataset.sh and scripts/download_fused_features.sh. These downloads can be substantial (e.g., 234.8G for ScanNet features).
  • Demo: An interactive, real-time demo is available, requiring no GPU.
  • Links: Paper, Video, Project Page.

Highlighted Details

  • Supports zero-shot semantic segmentation with arbitrary text queries (e.g., "snoopy", "soft", "metal", "cooking").
  • Enables open-vocabulary 3D scene exploration beyond fixed semantic labels.
  • Provides pre-processed data and fused features for multiple popular 3D datasets.
  • Includes code for distillation to train custom 3D models.

Maintenance & Community

  • Code released March 2023, with recent updates in October 2023 for LSeg feature extraction.
  • TODO list indicates ongoing development, including support for arbitrary scenes and web demos.
  • Contributions are welcomed.

Licensing & Compatibility

  • The README does not explicitly state a license. Code is provided for research purposes.

Limitations & Caveats

  • The project relies heavily on large pre-processed datasets and fused features, requiring significant storage and download time.
  • Some multi-view fused features (e.g., LSeg for Matterport/nuScenes, OpenSeg for nuScenes) are noted as "coming soon" or missing.
  • The project is associated with CVPR 2023, suggesting a research focus; production readiness may vary.
Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
32 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
6 more.

pytorch3d by facebookresearch

0.2%
9k
PyTorch3D is a PyTorch library for 3D deep learning research
created 5 years ago
updated 1 week ago
Feedback? Help us improve.