HOV-SG by hovsg

3D scene graph for language-grounded robot navigation research

Created 1 year ago

431 stars

Top 68.9% on SourcePulse

Project Summary

HOV-SG provides an official implementation for hierarchical open-vocabulary 3D scene graphs, enabling language-grounded robot navigation in complex indoor environments. It targets researchers and engineers in robotics and AI, offering a structured, compact representation of 3D scenes that surpasses dense maps in efficiency and semantic accuracy.

How It Works

HOV-SG constructs a hierarchical scene graph (floor, room, object) from RGB-D data. It leverages OpenCLIP for open-vocabulary feature extraction and SAM for class-agnostic segmentation, creating detailed, multi-level scene representations. This hierarchical approach allows for efficient storage and enables navigation across multiple floors using a cross-floor Voronoi graph.

Quick Start & Requirements

Install: Clone repo, create conda environment (environment.yaml), install habitat-sim (conda install habitat-sim -c conda-forge -c aihabitat), install HOV-SG package (pip install -e .).
Dependencies: OpenCLIP (CLIP-ViT-H-14-laion2B-s32B-b79K checkpoint), SAM (sam_vit_h_4b8939.pth checkpoint).
Datasets: Habitat Matterport 3D Semantics (HM3DSem), ScanNet, Replica. Requires specific data structures and potentially ~128 GB RAM for ground truth generation.
Links: Open CLIP, SAM, Habitat-Sim.

Highlighted Details

Achieves state-of-the-art open-vocabulary semantic accuracy at object, room, and floor levels.
Reduces representation size by 75% compared to dense open-vocabulary maps.
Demonstrates successful long-horizon language-conditioned navigation in real-world multi-story environments.
Supports cross-floor navigation via a Voronoi graph.

Maintenance & Community

Initial release in July 2024, with updates in August 2024 adding dataset generation and evaluation code. No community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

MIT license for academic usage. Commercial use requires contacting the authors.

Limitations & Caveats

The README mentions a recommendation of 128 GB RAM for compiling ground truth data, indicating a potentially high resource requirement for dataset preparation. Specific scenes are listed for evaluation, suggesting broader dataset compatibility may require further investigation.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days