Discover and explore top open-source AI tools and projects—updated daily.
MIV-XJTUVision-language navigation framework with dual implicit memory
Top 93.9% on SourcePulse
JanusVLN is a novel Vision-Language Navigation (VLN) framework designed to overcome 2D semantics-dominant limitations. It targets researchers and developers building embodied AI agents, enabling next-generation spatial agents through a focus on 3D spatial-semantic synergy, inspired by human cognitive processes.
How It Works
The core innovation is a dual implicit memory architecture that mimics human navigation by integrating semantic understanding (left-brain) with spatial cognition (right-brain). This design constructs two complementary, fixed-size neural memories, steering VLN research towards a crucial 3D spatial-semantic synergy for enhanced agent perception and decision-making.
Quick Start & Requirements
Installation requires cloning the repo, setting up a Python 3.9 Conda environment, and installing specific versions of habitat-sim (0.2.4) and habitat-lab (v0.2.4). PyTorch 2.5.1 with CUDA 12.4 is mandatory. Extensive data preparation involves downloading MP3D/HM3D scenes, VLN-CE episodes (R2R, RxR, ScaleVLN), and trajectory data from ModelScope. The primary install command is pip install -e . after environment setup.
Highlighted Details
JanusVLN_Base and JanusVLN_Extra models.Maintenance & Community
Affiliated with Amap, Alibaba Group, and Xi’an Jiaotong University. No specific community channels or active maintenance signals are detailed in the provided README.
Licensing & Compatibility
The license type and compatibility notes for commercial or closed-source use are not specified in the provided README content.
Limitations & Caveats
Recent issues reported with incorrect weights for the JanusVLN_Extra model. Setup involves complex data preparation and specific dependency versions. The project builds upon multiple other codebases.
3 weeks ago
Inactive
microsoft