Discover and explore top open-source AI tools and projects—updated daily.
InternRoboticsVision-and-language navigation for real-time robotic interaction
Top 91.8% on SourcePulse
Summary
StreamVLN enables real-time, multi-turn Vision-and-Language Navigation (VLN) from continuous video input. Extending LLaVA-Video, it models interleaved vision, language, and actions with efficient context handling for long sequences and online interaction. This project targets embodied AI and robotics researchers, offering a foundation for advanced autonomous navigation.
How It Works
Built on LLaVA-Video, StreamVLN uses "slowFast context modeling." It employs a fast-streaming dialogue context via a sliding-window KV cache for immediate responses and a slow-updating memory with token pruning for long-term context. This dual-stream approach balances computational demands with environmental understanding for navigation.
Quick Start & Requirements
habitat-sim (v0.2.4) and habitat-lab (v0.2.4) from source, cloning the StreamVLN repo, and installing dependencies.Highlighted Details
Maintenance & Community
Recent activity (Sept 2025) indicates active development. No specific community channels or explicit maintainer details are provided beyond the author list.
Licensing & Compatibility
Licensed under CC BY-NC-SA 4.0. Restricts usage to non-commercial purposes and requires derivative works to be shared under the same terms.
Limitations & Caveats
The CC BY-NC-SA 4.0 license prohibits commercial use. Setup is complex, requiring substantial data preparation and multiple dependencies, including specific versions of habitat-sim and habitat-lab.
2 days ago
Inactive
microsoft
NVIDIA