Code release for open-vocabulary 3D scene graphs
Top 53.0% on sourcepulse
ConceptGraphs provides an open-vocabulary 3D scene graph generation system for robots and perception researchers. It enables detailed scene understanding by creating object-centric 3D maps with semantic relationships, facilitating tasks like navigation and planning.
How It Works
The system integrates several advanced AI models: GradSLAM for 3D reconstruction, Grounded-SAM for open-vocabulary object detection and segmentation, and LLaVA for generating textual descriptions and relationships between objects. This pipeline first performs 3D mapping, then extracts object-level features and captions, and finally constructs a semantic scene graph.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is associated with researchers from institutions like MIT and CMU. Updates are available on the ali-dev
branch for a real-time, streamlined re-implementation.
Licensing & Compatibility
The primary repository is not explicitly licensed in the README. Dependencies like PyTorch3D and Grounded-Segment-Anything have their own licenses (e.g., PyTorch3D is BSD-style). Commercial use may require careful review of all component licenses.
Limitations & Caveats
The README notes that later commits of Grounded-SAM and LLaVA may require adaptations. Performance on AI2Thor may be worse due to domain gap, and quantitative evaluation on AI2Thor was not performed. GPT-3.5 is noted to produce inconsistent results for scene graph generation.
6 months ago
1 week