Discover and explore top open-source AI tools and projects—updated daily.
vlmapsSpatial mapping research paper for robot navigation using visual language
Top 54.6% on SourcePulse
VLMaps enables robots to navigate using natural language commands by fusing pre-trained visual-language model features into 3D reconstructions of the environment. This approach allows for zero-shot spatial goal navigation and landmark localization without additional data collection or model fine-tuning, targeting robotics researchers and developers.
How It Works
VLMaps represents spatial maps by integrating visual-language features from pre-trained models into a 3D reconstruction. This spatial anchoring of features enables natural language indexing, allowing robots to understand and act upon text-based navigation goals. The system leverages Matterport3D dataset and Habitat simulator for generating and testing these maps.
Quick Start & Requirements
conda create -n vlmaps python=3.8 and conda activate vlmaps, followed by bash install.bash.git checkout demo and running jupyter notebook demo.ipynb.Highlighted Details
Maintenance & Community
The project is associated with ICRA2023 and seeks community contributions for improving the navigation stack.
Licensing & Compatibility
MIT License, permitting commercial use and integration with closed-source systems.
Limitations & Caveats
The current navigation stack's reliance on a covisibility graph built from obstacle maps can lead to navigation issues in complex environments. The project is seeking community contributions to address these limitations and integrate with real-world robot sensors.
1 year ago
Inactive