concept-graphs  by concept-graphs

Code release for open-vocabulary 3D scene graphs

created 1 year ago
637 stars

Top 53.0% on sourcepulse

GitHubView on GitHub
Project Summary

ConceptGraphs provides an open-vocabulary 3D scene graph generation system for robots and perception researchers. It enables detailed scene understanding by creating object-centric 3D maps with semantic relationships, facilitating tasks like navigation and planning.

How It Works

The system integrates several advanced AI models: GradSLAM for 3D reconstruction, Grounded-SAM for open-vocabulary object detection and segmentation, and LLaVA for generating textual descriptions and relationships between objects. This pipeline first performs 3D mapping, then extracts object-level features and captions, and finally constructs a semantic scene graph.

Quick Start & Requirements

  • Installation: Requires Python 3.10.12, PyTorch (tested with 2.0.1), PyTorch3D, Faiss-CPU, GradSLAM, and Grounded-Segment-Anything.
  • Dependencies: CUDA toolkit (tested with 11.8), OpenAI API key (GPT-4 recommended), and specific model checkpoints (Grounded-DINO, SAM, etc.).
  • Dataset: Tested with Replica and AI2Thor datasets. Setup involves downloading specific dataset formats and configuring environment variables.
  • Setup Time: The extensive dependency list and model downloads suggest a setup time of several hours.
  • Documentation: Project Page, Paper, ArXiv, Video Tutorial, and GitHub repositories for dependencies are linked.

Highlighted Details

  • Supports both class-agnostic (ConceptGraphs) and class-aware (ConceptGraphs-Detect) segmentation using Grounded-SAM.
  • Enables interactive visualization of the mapping process and final scene graphs with rich callbacks.
  • Includes evaluation scripts for semantic segmentation on the Replica dataset.
  • Offers experimental support for AI2Thor datasets with various data generation methods.

Maintenance & Community

The project is associated with researchers from institutions like MIT and CMU. Updates are available on the ali-dev branch for a real-time, streamlined re-implementation.

Licensing & Compatibility

The primary repository is not explicitly licensed in the README. Dependencies like PyTorch3D and Grounded-Segment-Anything have their own licenses (e.g., PyTorch3D is BSD-style). Commercial use may require careful review of all component licenses.

Limitations & Caveats

The README notes that later commits of Grounded-SAM and LLaVA may require adaptations. Performance on AI2Thor may be worse due to domain gap, and quantitative evaluation on AI2Thor was not performed. GPT-3.5 is noted to produce inconsistent results for scene graph generation.

Health Check
Last commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
59 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.