Research paper for vision-language navigation in continuous environments
Top 82.6% on sourcepulse
ETPNav addresses vision-language navigation (VLN) in continuous environments, a challenging task requiring agents to plan long-range paths and navigate obstacles based on visual and textual instructions. It targets researchers and engineers working on embodied AI and robotics, offering a robust framework that improves upon state-of-the-art methods by over 10-20% on benchmark datasets.
How It Works
ETPNav employs a two-stage approach: topological planning and obstacle-avoiding control. It constructs an online topological map by self-organizing waypoints encountered during traversal, enabling high-level planning independent of prior environmental knowledge. A transformer-based cross-modal planner generates navigation sequences from this map and instructions. Low-level control is handled by a trial-and-error heuristic to avoid static and dynamic obstacles. This modular design separates planning from control, enhancing robustness and adaptability.
Quick Start & Requirements
gym==0.21.0
is a critical dependency.requirements.txt
.Highlighted Details
Maintenance & Community
The project is associated with authors from multiple institutions. Contact information for key contributors is provided. The README mentions inspirations from CWP, Sim2Sim, and DUET.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is marked as "Official repo" and has achieved significant benchmark improvements, suggesting a stable implementation. However, specific details on community support, ongoing maintenance, or potential deprecations are not provided in the README. The setup requires a specific, older version of gym
(0.21.0) due to compatibility issues with Habitat-lab v0.1.7.
4 months ago
Inactive