Large-VLM-based-VLA-for-Robotic-Manipulation  by JiuTian-VL

Advancing robotic manipulation with large Vision-Language-Action models

Created 6 months ago
267 stars

Top 96.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, curated survey and resource hub for large Vision-Language-Action (VLA) models applied to robotic manipulation. It addresses the growing need for robots that can interpret natural language, perceive complex environments, and execute diverse tasks with enhanced generalization, targeting researchers and engineers in AI and robotics. The project offers a structured overview of the rapidly evolving field, consolidating key papers, benchmarks, and resources for easier access and reference.

How It Works

The project systematically categorizes and lists research papers and resources related to large VLM-based VLA models for robotic manipulation. It organizes findings into key architectural paradigms, including monolithic (single and dual-system) and hierarchical models, alongside advanced fields like reinforcement learning, training-free methods, learning from human videos, and world model-based approaches. This structured compilation facilitates a deep understanding of the landscape and the diverse methodologies employed.

Quick Start & Requirements

This repository is a curated list of research resources, not a deployable software package. The primary resource is the survey paper: "Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey" arXiv. No specific installation or runtime requirements are listed for the repository itself.

Highlighted Details

  • Comprehensive categorization of VLA models into Monolithic, Hierarchical, and Other Advanced Fields (RL, Training-Free, Human Videos, World Models).
  • Extensive listing of Datasets and Benchmarks, including Real-world Robot Datasets, Simulation Environments, Human Behavior Datasets, and Embodied Datasets.
  • Covers a wide range of research publications, with many entries dated 2023-2025, reflecting the cutting edge of the field.
  • Includes direct links to papers (via arXiv) and associated code repositories where available.

Maintenance & Community

The project is actively maintained, with a note indicating "We're still cooking — Stay tuned!" and a commitment to continuously update the repository with newly published works. Community engagement is encouraged via GitHub pull requests for contributions. Contact information for the authors is provided for questions and suggestions.

Licensing & Compatibility

The repository is licensed under the MIT License, which generally permits broad use, modification, and distribution, including for commercial purposes, with minimal restrictions.

Limitations & Caveats

As a survey and curated list, this repository does not provide a unified codebase or a direct implementation of VLA models. The "still cooking" status suggests ongoing development and potential for future additions or revisions. The rapid pace of research in this domain means the landscape is constantly shifting.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
42 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.