Spatial mapping research paper for robot navigation using visual language
Top 60.3% on sourcepulse
VLMaps enables robots to navigate using natural language commands by fusing pre-trained visual-language model features into 3D reconstructions of the environment. This approach allows for zero-shot spatial goal navigation and landmark localization without additional data collection or model fine-tuning, targeting robotics researchers and developers.
How It Works
VLMaps represents spatial maps by integrating visual-language features from pre-trained models into a 3D reconstruction. This spatial anchoring of features enables natural language indexing, allowing robots to understand and act upon text-based navigation goals. The system leverages Matterport3D dataset and Habitat simulator for generating and testing these maps.
Quick Start & Requirements
conda create -n vlmaps python=3.8
and conda activate vlmaps
, followed by bash install.bash
.git checkout demo
and running jupyter notebook demo.ipynb
.Highlighted Details
Maintenance & Community
The project is associated with ICRA2023 and seeks community contributions for improving the navigation stack.
Licensing & Compatibility
MIT License, permitting commercial use and integration with closed-source systems.
Limitations & Caveats
The current navigation stack's reliance on a covisibility graph built from obstacle maps can lead to navigation issues in complex environments. The project is seeking community contributions to address these limitations and integrate with real-world robot sensors.
1 year ago
Inactive