Awesome-VLA-Papers by Psi-Robot

Vision-Language-Action (VLA) research paper compilation

Created 9 months ago

401 stars

Top 72.2% on SourcePulse

Project Summary

This repository serves as a comprehensive, curated bibliography of research papers focused on Vision-Language-Action (VLA) models, essential for researchers and practitioners in embodied AI and robotics. It systematically organizes seminal and recent works, offering a structured overview of advancements in VLA research, its foundational components, and diverse applications.

How It Works

The collection is meticulously organized by distinct approaches to integrating vision, language, and action modalities, primarily categorized by how actions are tokenized and represented. Key sections include "Language Description as Action Tokens," "Code as Action Tokens," "Affordance as Action Tokens," and "Reasoning as Action Tokens." It also details foundational language and vision models, specific VLA architectures, and related survey papers, with each entry providing direct links to publications, code repositories, pre-trained models, and official websites for in-depth exploration.

Highlighted Details

Action Tokenization Paradigms: The core strength lies in its detailed categorization of VLA research based on how actions are tokenized and integrated with vision and language, covering diverse paradigms like language descriptions, code, affordances, keypoints, and reasoning.
Broad Model Coverage: Encompasses a wide array of foundational models (e.g., Transformers, ViT, CLIP) and specific VLA architectures, spanning applications in robotics, autonomous driving, and generalist agents.
Rich Metadata and Links: Each listed paper includes direct links to its publication, associated code repositories, pre-trained models, datasets, and official websites, facilitating efficient access to research artifacts.
Survey and Dataset Compilations: Features dedicated sections for related survey papers and relevant datasets, providing broader context and resources for understanding the VLA research landscape and its data requirements.

Maintenance & Community

The README does not specify maintenance details or community channels for this repository itself. The individual research papers listed may have their own associated communities and development efforts.

Licensing & Compatibility

No licensing information is provided for this repository or the curated collection of research papers.

Limitations & Caveats

This repository is purely an informational resource, functioning as a curated bibliography of research papers. It does not provide any executable code, models, or direct tools for implementation, serving solely as a reference guide for the VLA research landscape.

Awesome-VLA-Papers by Psi-Robot

Explore Similar Projects

Motus by thu-ml

Embodied-AI-Paper-TopConf by Songwxuan

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

vla0 by NVlabs

RDT2 by thu-ml

molmoact by allenai

Awesome-Embodied-AI by yunlongdong

CogACT by microsoft

UniVLA by OpenDriveLab

awesome-embodied-vla-va-vln by jonyzhang2023

Magma by microsoft

Isaac-GR00T by NVIDIA