awesome-vla-wam by DravenALG

Curated research on Vision-Language-Action and World Action Models

Created 5 months ago

869 stars

Top 40.5% on SourcePulse

Project Summary

This repository curates research on Vision-Language-Action (VLA) and World Action Models (WAM), aiming to provide a structured overview of foundation models for robotics. It serves scholars and researchers by clarifying the rapidly evolving landscape of embodied AI, offering a comprehensive resource for understanding current advancements and identifying future research directions.

How It Works

The project categorizes and lists research papers and models related to VLA and WAM. VLA models leverage pre-trained Vision-Language Models (VLMs) for language-grounded, scalable robot policies, originating from concepts like RT-2. WAM models focus on predicting actions by utilizing world modeling capabilities, as exemplified by DreamZero. The list highlights intersections where WAMs are built upon VLMs.

Quick Start & Requirements

This is a curated list of research papers and models, not a deployable software package. It does not provide direct installation instructions or specific software requirements. Users are expected to engage with the cited research papers individually.

Highlighted Details

Comprehensive categorization of VLA models, including sub-areas like VLA with 3D/4D modeling, Reinforcement Learning, efficiency, latent actions, and domain-specific applications.
Extensive coverage of World Action Models (WAM), categorized by their origin (VideoGen, VLM, from scratch) and general world models.
A dedicated "Resources" section lists relevant Robotics Datasets, Ego Human Datasets, Benchmarks/Environments, Physics Engines, and Hardware.
Key foundational models like RT-2, DreamZero, Qwen-VLA, and Octo are frequently cited.

Maintenance & Community

The repository actively encourages community contributions through pull requests or issues for adding new papers. It aims for continuous updates and refinement to maintain a high-quality list.

Licensing & Compatibility

No specific open-source license is mentioned in the provided README content. Users should assume all rights are reserved unless otherwise specified by the original authors of the listed research.

Limitations & Caveats

As a curated list, it does not represent a single, unified software project with defined limitations. However, the README itself points to "10 Open Challenges Steering the Future of Vision-Language-Action Models," indicating active areas of research and potential gaps in current capabilities.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

78 stars in the last 30 days