Discover and explore top open-source AI tools and projects—updated daily.
zhangyuejoslinSurvey of Vision-and-Language Navigation leveraging foundation models
Top 100.0% on SourcePulse
Summary
This repository accompanies the TMLR 2024 survey paper "Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models." It addresses the rapidly evolving field of Vision-and-Language Navigation (VLN) by structuring recent advancements, particularly those driven by foundation models. Aimed at researchers and practitioners, it offers a principled framework for understanding VLN challenges and solutions, highlighting opportunities for leveraging large-scale models in embodied AI.
How It Works
The survey adopts a top-down review methodology, organizing VLN research via a principled framework for embodied planning and reasoning. It emphasizes how foundation models shape current methods and future directions. The approach categorizes existing work into key areas: World Models, Human Models, VLN Agents, Behavior Analysis, Continuous Environments (VLN-CE), and LLM/VLM-based agents (Zero-shot and Fine-tuning). This structure provides clarity and identifies research gaps for VLN specialists and foundation model researchers.
Quick Start & Requirements
This repository is a survey companion, not a runnable software project. It serves as a curated collection of research papers, their venues, dates, and associated code repositories. No installation or execution commands are applicable. The primary resource is the survey paper itself, available via arXiv:2407.07035.
Highlighted Details
Maintenance & Community
The repository is actively maintained with plans for frequent updates reflecting the latest VLN research. Community input is highly valued; researchers are encouraged to suggest additional work by emailing zhan1624@msu.edu or raising a GitHub issue. Specific community channels like Discord or Slack are not mentioned.
Licensing & Compatibility
The provided README content does not specify a software license for the repository or the survey. As a survey document and a collection of links, direct licensing and compatibility concerns are minimal for the survey's content. Users should refer to the licenses of individual linked projects.
Limitations & Caveats
As a survey, this repository's primary limitation is its nature as a curated overview, not an executable system. While aiming for frequent updates, it represents a research snapshot. The focus is specifically on VLN research involving foundation models, potentially offering less depth on earlier VLN approaches. Detailed implementation specifics or performance benchmarks for individual surveyed methods are not provided.
4 months ago
Inactive
microsoft
GT-RIPL