awesome-vla-for-ad by worldbench

Advancing autonomous driving with Vision-Language-Action models

Created 5 months ago

301 stars

Top 88.8% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This repository provides a comprehensive survey of Vision-Language-Action (VLA) models for autonomous driving (AD), addressing the limitations of traditional modular AD pipelines. It targets researchers and engineers in the AD domain by organizing the evolution from VA to VLA models, offering a structured overview of current paradigms and advancements.

How It Works

The survey categorizes VLA models into two principal paradigms: End-to-End VLA, which integrates perception, reasoning, and planning within a single model, and Dual-System VLA, which separates deliberation (via VLMs) from fast, safety-critical execution (via planners). This approach aims to overcome the limitations of traditional modular pipelines, which often struggle in complex, dynamic, or long-tailed scenarios and amplify upstream perception errors. VLA models offer a more holistic integration, potentially leading to improved performance and robustness in challenging driving environments.

Quick Start & Requirements

This repository is a survey and does not provide direct installation or execution instructions for a specific model. It links to the associated paper, a project page, and a HuggingFace Leaderboard for further details and potential demonstrations.

Highlighted Details

Comprehensive review of Vision-Action (VA) and Vision-Language-Action (VLA) models for autonomous driving, tracing their evolution.
Organization of VLA models into two principal paradigms: End-to-End VLA and Dual-System VLA.
Detailed categorization of VLA models, including Textual Action Generators, Numerical Action Generators, Explicit Action Guidance, and Implicit Representations Transfer.
Inclusion of relevant datasets and benchmarks for VLA in autonomous driving research, alongside application areas.

Maintenance & Community

The provided README does not contain information regarding project maintenance, community channels (e.g., Discord, Slack), or a public roadmap.

Licensing & Compatibility

No specific software license is mentioned in the provided README content.

Limitations & Caveats

As a survey, this repository does not offer a deployable system but rather an organized overview of existing research. Its scope is limited to VLA models specifically for autonomous driving applications.

Health Check

Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

38 stars in the last 30 days