concept-graphs by concept-graphs

Code release for open-vocabulary 3D scene graphs

Created 2 years ago

754 stars

Top 46.2% on SourcePulse

Project Summary

ConceptGraphs provides an open-vocabulary 3D scene graph generation system for robots and perception researchers. It enables detailed scene understanding by creating object-centric 3D maps with semantic relationships, facilitating tasks like navigation and planning.

How It Works

The system integrates several advanced AI models: GradSLAM for 3D reconstruction, Grounded-SAM for open-vocabulary object detection and segmentation, and LLaVA for generating textual descriptions and relationships between objects. This pipeline first performs 3D mapping, then extracts object-level features and captions, and finally constructs a semantic scene graph.

Quick Start & Requirements

Installation: Requires Python 3.10.12, PyTorch (tested with 2.0.1), PyTorch3D, Faiss-CPU, GradSLAM, and Grounded-Segment-Anything.
Dependencies: CUDA toolkit (tested with 11.8), OpenAI API key (GPT-4 recommended), and specific model checkpoints (Grounded-DINO, SAM, etc.).
Dataset: Tested with Replica and AI2Thor datasets. Setup involves downloading specific dataset formats and configuring environment variables.
Setup Time: The extensive dependency list and model downloads suggest a setup time of several hours.
Documentation: Project Page, Paper, ArXiv, Video Tutorial, and GitHub repositories for dependencies are linked.

Highlighted Details

Supports both class-agnostic (ConceptGraphs) and class-aware (ConceptGraphs-Detect) segmentation using Grounded-SAM.
Enables interactive visualization of the mapping process and final scene graphs with rich callbacks.
Includes evaluation scripts for semantic segmentation on the Replica dataset.
Offers experimental support for AI2Thor datasets with various data generation methods.

Maintenance & Community

The project is associated with researchers from institutions like MIT and CMU. Updates are available on the ali-dev branch for a real-time, streamlined re-implementation.

Licensing & Compatibility

The primary repository is not explicitly licensed in the README. Dependencies like PyTorch3D and Grounded-Segment-Anything have their own licenses (e.g., PyTorch3D is BSD-style). Commercial use may require careful review of all component licenses.

Limitations & Caveats

The README notes that later commits of Grounded-SAM and LLaVA may require adaptations. Performance on AI2Thor may be worse due to domain gap, and quantitative evaluation on AI2Thor was not performed. GPT-3.5 is noted to produce inconsistent results for scene graph generation.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

23 stars in the last 30 days