openscene by pengsongyou

3D scene understanding research paper using open-vocabulary queries

Created 2 years ago

797 stars

Top 44.3% on SourcePulse

Project Summary

OpenScene provides a zero-shot framework for 3D scene understanding using open-vocabulary queries, enabling tasks like semantic segmentation, rare object search, and image-based object detection. It targets researchers and practitioners in 3D computer vision who need flexible scene analysis beyond predefined categories.

How It Works

OpenScene leverages multi-view 2D image features fused onto 3D point clouds. This approach combines the rich semantic understanding of large-scale 2D vision models (like OpenSeg or LSeg) with the geometric context of 3D data. By projecting 2D features onto 3D points, it achieves open-vocabulary capabilities, allowing queries based on arbitrary text descriptions, properties, or activities.

Quick Start & Requirements

Installation: Follow installation.md.
Prerequisites: PyTorch, CUDA (implied for feature extraction/training), specific datasets (ScanNet, Matterport3D, nuScenes, Replica).
Data: Pre-processed datasets and multi-view fused features are available for download via scripts/download_dataset.sh and scripts/download_fused_features.sh. These downloads can be substantial (e.g., 234.8G for ScanNet features).
Demo: An interactive, real-time demo is available, requiring no GPU.
Links: Paper, Video, Project Page.

Highlighted Details

Supports zero-shot semantic segmentation with arbitrary text queries (e.g., "snoopy", "soft", "metal", "cooking").
Enables open-vocabulary 3D scene exploration beyond fixed semantic labels.
Provides pre-processed data and fused features for multiple popular 3D datasets.
Includes code for distillation to train custom 3D models.

Maintenance & Community

Code released March 2023, with recent updates in October 2023 for LSeg feature extraction.
TODO list indicates ongoing development, including support for arbitrary scenes and web demos.
Contributions are welcomed.

Licensing & Compatibility

The README does not explicitly state a license. Code is provided for research purposes.

Limitations & Caveats

The project relies heavily on large pre-processed datasets and fused features, requiring significant storage and download time.
Some multi-view fused features (e.g., LSeg for Matterport/nuScenes, OpenSeg for nuScenes) are noted as "coming soon" or missing.
The project is associated with CVPR 2023, suggesting a research focus; production readiness may vary.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

12 stars in the last 30 days

Explore Similar Projects

SceneVerse by scene-verse

Scaling 3D vision-language learning for grounded scene understanding

Created 2 years ago

Updated 9 months ago

PonderV2 by OpenGVLab

3D pre-training framework for efficient 3D representations

Created 2 years ago

Updated 3 months ago

ml-cubifyanything by apple

Scaling indoor 3D object detection and spatial understanding

Created 9 months ago

Updated 2 months ago

PointCLIP_V2 by yangyangyang127

3D open-world learning research paper

Created 3 years ago

Updated 5 months ago

Awesome6DPoseEstimation by Jianqiuer

A curated collection of recent research on 6D pose estimation

Created 2 years ago

Updated 1 day ago

SAM2Point by ZiyuGuo99

3D segmentation via adapting Segment Anything Model (SAM)

Created 1 year ago

Updated 1 year ago

SG-Nav by bagh2178

LLM-powered zero-shot navigation via 3D scene graphs

Created 1 year ago

Updated 3 months ago

Segment-Any-Point-Cloud by youquanl

Framework for point cloud sequence segmentation via vision foundation model distillation

Created 2 years ago

Updated 2 years ago

Point-Bind_Point-LLM by ZiyuGuo99

3D multi-modality model aligning point clouds with language models

Created 2 years ago

Updated 2 years ago

Awesome-3D-Scene-Generation by hzxie

Curated list of 3D scene generation papers

Created 11 months ago

Updated 2 weeks ago

3D-Shape-Analysis-Paper-List by yinyunie

Curated list of 3D shape/scene analysis papers, libraries, and datasets

Created 6 years ago

Updated 2 years ago

Starred by

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind),

Chuan Li

Chuan Li(Chief Scientific Officer at Lambda), and

6 more.

3D-Machine-Learning by timzhang642

Resource list for 3D machine learning

Created 8 years ago

Updated 1 year ago

Feedback? Help us improve.