HA-VLN  by F1y1113

Human-aware navigation in complex, interactive environments

Created 11 months ago
301 stars

Top 88.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

HA-VLN introduces a benchmark, simulator, and datasets for Human-Aware Vision-and-Language Navigation (VLN) in environments with dynamic human interactions. It targets embodied AI and robotics researchers, enabling development and evaluation of navigation agents that coexist safely with humans, fostering socially intelligent systems.

How It Works

This project extends VLN to discrete and continuous environments featuring social behaviors and multi-human dynamics. It utilizes the HA-VLN Simulator for real-time human activity rendering, powered by the HAPS 2.0 Dataset (3D human motion models, region-aware descriptions) and the HA-R2R Dataset (complex navigation instructions with human interactions). Proposed models like HA-VLN-VL and HA-CMA address visual-language understanding and dynamic decision-making. Human-scene fusion employs multi-view capture and 3D skeleton tracking for accurate placement and rendering, while LLMs generate enriched navigation instructions.

Quick Start & Requirements

Setup demands Python 3.7, habitat-lab (v0.1.7), habitat-sim (v0.1.7 headless), GroundingDINO, and PyTorch (1.9.1+cu111 with CUDA >=11.1). Crucially, Matterport3D Dataset access is required. The installation is complex, needing specific library versions and significant disk space for datasets, posing a substantial setup hurdle. Relevant setup guides and dataset links are available within the repository.

Highlighted Details

  • HAPS 2.0 Dataset: Provides 486 detailed 3D human motion models (via MDM) across 90 scenes and 26 regions, with LLM-validated activity descriptions.
  • HA-R2R Dataset: Features complex navigation instructions enriched by LLMs for multi-human and agent-human interactions.
  • Human-Scene Fusion: Employs a 9-camera multi-view setup (3D skeleton tracking inspired) for accurate human placement and clipping correction.
  • Real-time Rendering: The simulator dynamically adds human models and recalculates navigation meshes on-the-fly, configurable via task YAML settings.

Maintenance & Community

Authored by researchers from the University of Washington, Carnegie Mellon University, and Microsoft Research. Contributions are welcomed via direct contact with the listed authors (yd2616@columbia.edu or wufengyi98@gmail.com). No community forums like Discord or Slack are mentioned.

Licensing & Compatibility

Released under the permissive MIT License, suitable for commercial use. However, strict requirements for older library versions (Python 3.7, PyTorch 1.9.1) may limit compatibility with modern development stacks.

Limitations & Caveats

The primary adoption barrier is the intricate, version-specific installation process, requiring Python 3.7 and older habitat-sim/habitat-lab versions. Matterport3D dataset access is also a prerequisite. Initial LLM instruction quality issues required iterative refinement, indicating potential complexities in ensuring robust instruction fidelity.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
245 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.