HOV-SG  by hovsg

3D scene graph for language-grounded robot navigation research

created 1 year ago
335 stars

Top 83.1% on sourcepulse

GitHubView on GitHub
Project Summary

HOV-SG provides an official implementation for hierarchical open-vocabulary 3D scene graphs, enabling language-grounded robot navigation in complex indoor environments. It targets researchers and engineers in robotics and AI, offering a structured, compact representation of 3D scenes that surpasses dense maps in efficiency and semantic accuracy.

How It Works

HOV-SG constructs a hierarchical scene graph (floor, room, object) from RGB-D data. It leverages OpenCLIP for open-vocabulary feature extraction and SAM for class-agnostic segmentation, creating detailed, multi-level scene representations. This hierarchical approach allows for efficient storage and enables navigation across multiple floors using a cross-floor Voronoi graph.

Quick Start & Requirements

  • Install: Clone repo, create conda environment (environment.yaml), install habitat-sim (conda install habitat-sim -c conda-forge -c aihabitat), install HOV-SG package (pip install -e .).
  • Dependencies: OpenCLIP (CLIP-ViT-H-14-laion2B-s32B-b79K checkpoint), SAM (sam_vit_h_4b8939.pth checkpoint).
  • Datasets: Habitat Matterport 3D Semantics (HM3DSem), ScanNet, Replica. Requires specific data structures and potentially ~128 GB RAM for ground truth generation.
  • Links: Open CLIP, SAM, Habitat-Sim.

Highlighted Details

  • Achieves state-of-the-art open-vocabulary semantic accuracy at object, room, and floor levels.
  • Reduces representation size by 75% compared to dense open-vocabulary maps.
  • Demonstrates successful long-horizon language-conditioned navigation in real-world multi-story environments.
  • Supports cross-floor navigation via a Voronoi graph.

Maintenance & Community

Initial release in July 2024, with updates in August 2024 adding dataset generation and evaluation code. No community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

MIT license for academic usage. Commercial use requires contacting the authors.

Limitations & Caveats

The README mentions a recommendation of 128 GB RAM for compiling ground truth data, indicating a potentially high resource requirement for dataset preparation. Specific scenes are listed for evaluation, suggesting broader dataset compatibility may require further investigation.

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
52 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.