VLN-Survey-with-Foundation-Models by zhangyuejoslin

Survey of Vision-and-Language Navigation leveraging foundation models

Created 1 year ago

290 stars

Top 90.6% on SourcePulse

Project Summary

Summary

This repository accompanies the TMLR 2024 survey paper "Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models." It addresses the rapidly evolving field of Vision-and-Language Navigation (VLN) by structuring recent advancements, particularly those driven by foundation models. Aimed at researchers and practitioners, it offers a principled framework for understanding VLN challenges and solutions, highlighting opportunities for leveraging large-scale models in embodied AI.

How It Works

The survey adopts a top-down review methodology, organizing VLN research via a principled framework for embodied planning and reasoning. It emphasizes how foundation models shape current methods and future directions. The approach categorizes existing work into key areas: World Models, Human Models, VLN Agents, Behavior Analysis, Continuous Environments (VLN-CE), and LLM/VLM-based agents (Zero-shot and Fine-tuning). This structure provides clarity and identifies research gaps for VLN specialists and foundation model researchers.

Quick Start & Requirements

This repository is a survey companion, not a runnable software project. It serves as a curated collection of research papers, their venues, dates, and associated code repositories. No installation or execution commands are applicable. The primary resource is the survey paper itself, available via arXiv:2407.07035.

Highlighted Details

The survey meticulously categorizes VLN research into distinct areas: World Model, Human Model, VLN Agent, Behavior Analysis, Continuous Environments (VLN-CE), and LLM/VLM-based VLN Agents (Zero-shot and Fine-tuning).
It provides an extensive list of relevant academic papers, detailing their publication venue, date, and direct links to code repositories.
The repository is intended as a living document, with frequent updates planned and community contributions actively encouraged via email or GitHub issues.

Maintenance & Community

The repository is actively maintained with plans for frequent updates reflecting the latest VLN research. Community input is highly valued; researchers are encouraged to suggest additional work by emailing zhan1624@msu.edu or raising a GitHub issue. Specific community channels like Discord or Slack are not mentioned.

Licensing & Compatibility

The provided README content does not specify a software license for the repository or the survey. As a survey document and a collection of links, direct licensing and compatibility concerns are minimal for the survey's content. Users should refer to the licenses of individual linked projects.

Limitations & Caveats

As a survey, this repository's primary limitation is its nature as a curated overview, not an executable system. While aiming for frequent updates, it represents a research snapshot. The focus is specifically on VLN research involving foundation models, potentially offering less depth on earlier VLN approaches. Detailed implementation specifics or performance benchmarks for individual surveyed methods are not provided.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days