openscene  by pengsongyou

3D scene understanding research paper using open-vocabulary queries

Created 2 years ago
753 stars

Top 46.1% on SourcePulse

GitHubView on GitHub
Project Summary

OpenScene provides a zero-shot framework for 3D scene understanding using open-vocabulary queries, enabling tasks like semantic segmentation, rare object search, and image-based object detection. It targets researchers and practitioners in 3D computer vision who need flexible scene analysis beyond predefined categories.

How It Works

OpenScene leverages multi-view 2D image features fused onto 3D point clouds. This approach combines the rich semantic understanding of large-scale 2D vision models (like OpenSeg or LSeg) with the geometric context of 3D data. By projecting 2D features onto 3D points, it achieves open-vocabulary capabilities, allowing queries based on arbitrary text descriptions, properties, or activities.

Quick Start & Requirements

  • Installation: Follow installation.md.
  • Prerequisites: PyTorch, CUDA (implied for feature extraction/training), specific datasets (ScanNet, Matterport3D, nuScenes, Replica).
  • Data: Pre-processed datasets and multi-view fused features are available for download via scripts/download_dataset.sh and scripts/download_fused_features.sh. These downloads can be substantial (e.g., 234.8G for ScanNet features).
  • Demo: An interactive, real-time demo is available, requiring no GPU.
  • Links: Paper, Video, Project Page.

Highlighted Details

  • Supports zero-shot semantic segmentation with arbitrary text queries (e.g., "snoopy", "soft", "metal", "cooking").
  • Enables open-vocabulary 3D scene exploration beyond fixed semantic labels.
  • Provides pre-processed data and fused features for multiple popular 3D datasets.
  • Includes code for distillation to train custom 3D models.

Maintenance & Community

  • Code released March 2023, with recent updates in October 2023 for LSeg feature extraction.
  • TODO list indicates ongoing development, including support for arbitrary scenes and web demos.
  • Contributions are welcomed.

Licensing & Compatibility

  • The README does not explicitly state a license. Code is provided for research purposes.

Limitations & Caveats

  • The project relies heavily on large pre-processed datasets and fused features, requiring significant storage and download time.
  • Some multi-view fused features (e.g., LSeg for Matterport/nuScenes, OpenSeg for nuScenes) are noted as "coming soon" or missing.
  • The project is associated with CVPR 2023, suggesting a research focus; production readiness may vary.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chuan Li Chuan Li(Chief Scientific Officer at Lambda), and
6 more.

3D-Machine-Learning by timzhang642

0.1%
10k
Resource list for 3D machine learning
Created 8 years ago
Updated 1 year ago
Feedback? Help us improve.