awesome-vla-study by MilkClouds

Advancing embodied AI with Vision-Language-Action models

Created 4 months ago

309 stars

Top 86.8% on SourcePulse

Project Summary

A structured reading list for Vision-Language-Action (VLA) models, guiding users from foundational generative concepts to state-of-the-art robot foundation models, data scaling, RL fine-tuning, and world models. It serves as a comprehensive, ordered resource for researchers and engineers aiming to quickly understand the VLA domain.

How It Works

The study is divided into six progressive phases, beginning with generative model foundations (diffusion, flow matching) and advancing through early and current robot foundation model architectures. Subsequent phases cover data scaling strategies, efficient inference, and advanced topics like RL fine-tuning, reasoning, and world models. Papers are presented in a recommended reading order to foster a coherent understanding of VLA model evolution.

Quick Start & Requirements

No direct software installation is needed. Prerequisites include a grasp of basic probability, optimization, and deep learning fundamentals (Transformers, attention). Linked introductory courses like MIT 6.S191 and Andrej Karpathy's "Zero to Hero" provide foundational knowledge. A weekly presentation and discussion format is recommended for structured learning.

Highlighted Details

Foundational Models: Covers early VLA paradigms (RT-1, RT-2) and open-source generalist policies (Octo, OpenVLA).
Architectural Evolution: Explores VLM + action head designs (CogACT, GR00T N1) and models leveraging intermediate VLM features (π0, InternVLA-M1).
Data Scaling & Collection: Highlights large-scale datasets (OXE, AgiBot World) and novel collection methods (UMI, video-to-VLA transfer).
Advanced Capabilities: Features papers on efficient inference (SmolVLA, RTC), dual-system approaches (Helix, Fast-in-Slow), RL fine-tuning, reasoning (CoT-VLA, ThinkAct), and world models (UniVLA, DreamZero).

Maintenance & Community

As a curated reading list, this repository lacks traditional software maintenance. Contributions for new papers or structural improvements are welcomed via GitHub issues or pull requests. Related curated lists are also provided.

Licensing & Compatibility

No specific software license is mentioned for the reading list itself. The content links to academic papers, each with its own publication terms. Commercial use compatibility depends on individual paper licenses.

Limitations & Caveats

This is a static reading list reflecting the VLA landscape at a specific point in time. It requires significant self-directed study. The rapid evolution of VLA research means the field will continue to advance beyond this list's scope.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

30 stars in the last 30 days