Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

Advancing robotic manipulation with large Vision-Language-Action models

Created 9 months ago

354 stars

Top 79.1% on SourcePulse

Project Summary

This repository serves as a comprehensive, curated survey and resource hub for large Vision-Language-Action (VLA) models applied to robotic manipulation. It addresses the growing need for robots that can interpret natural language, perceive complex environments, and execute diverse tasks with enhanced generalization, targeting researchers and engineers in AI and robotics. The project offers a structured overview of the rapidly evolving field, consolidating key papers, benchmarks, and resources for easier access and reference.

How It Works

The project systematically categorizes and lists research papers and resources related to large VLM-based VLA models for robotic manipulation. It organizes findings into key architectural paradigms, including monolithic (single and dual-system) and hierarchical models, alongside advanced fields like reinforcement learning, training-free methods, learning from human videos, and world model-based approaches. This structured compilation facilitates a deep understanding of the landscape and the diverse methodologies employed.

Quick Start & Requirements

This repository is a curated list of research resources, not a deployable software package. The primary resource is the survey paper: "Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey" arXiv. No specific installation or runtime requirements are listed for the repository itself.

Highlighted Details

Comprehensive categorization of VLA models into Monolithic, Hierarchical, and Other Advanced Fields (RL, Training-Free, Human Videos, World Models).
Extensive listing of Datasets and Benchmarks, including Real-world Robot Datasets, Simulation Environments, Human Behavior Datasets, and Embodied Datasets.
Covers a wide range of research publications, with many entries dated 2023-2025, reflecting the cutting edge of the field.
Includes direct links to papers (via arXiv) and associated code repositories where available.

Maintenance & Community

The project is actively maintained, with a note indicating "We're still cooking — Stay tuned!" and a commitment to continuously update the repository with newly published works. Community engagement is encouraged via GitHub pull requests for contributions. Contact information for the authors is provided for questions and suggestions.

Licensing & Compatibility

The repository is licensed under the MIT License, which generally permits broad use, modification, and distribution, including for commercial purposes, with minimal restrictions.

Limitations & Caveats

As a survey and curated list, this repository does not provide a unified codebase or a direct implementation of VLA models. The "still cooking" status suggests ongoing development and potential for future additions or revisions. The rapid pace of research in this domain means the landscape is constantly shifting.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days