SceneVerse by scene-verse

Scaling 3D vision-language learning for grounded scene understanding

Created 2 years ago

276 stars

Top 93.9% on SourcePulse

Project Summary

SceneVerse provides the first million-scale 3D vision-language dataset and official implementation for grounded scene understanding. It targets researchers and practitioners in 3D computer vision and natural language processing, enabling state-of-the-art performance on 3D visual grounding benchmarks and facilitating zero-shot transfer capabilities.

How It Works

SceneVerse leverages a large-scale dataset comprising 68K 3D indoor scenes and 2.5M vision-language pairs. The core approach involves a GPS (Grounded Pre-training for Scenes) model, which is pre-trained on this extensive dataset. This pre-training strategy is designed to capture rich semantic relationships between 3D environments and textual descriptions, enabling robust generalization and zero-shot transfer to downstream tasks.

Quick Start & Requirements

Installation: Refer to TRAIN.md for detailed instructions on training and inference.
Data: Download instructions and processing scripts are available in DATA.md.
Pre-trained Checkpoints: Available as per TRAIN.md.

Highlighted Details

Million-scale 3D vision-language dataset with 68K 3D indoor scenes.
Achieves state-of-the-art on existing 3D visual grounding benchmarks.
Demonstrates zero-shot transfer capabilities with the GPS model.
Supports multiple 3D datasets including ScanNet, MultiScan, ARKitScenes, HM3D, 3RScan, Structured3D, and ProcTHOR.

Maintenance & Community

The project is associated with the ECCV 2024 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding." Pre-trained checkpoints and training/inference code were released in late 2024.

Licensing & Compatibility

The README does not explicitly state the license. However, the project heavily adapts code from other open-source datasets and projects, suggesting a potential need to review those licenses for compatibility.

Limitations & Caveats

The dataset includes "template" entries for HM3D and Structured3D, indicating that not all data modalities or annotations might be fully available or processed for these specific datasets. The project is relatively new, with code and checkpoints released in mid-to-late 2024.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days