VLA-Adapter by OpenHelix-Team

Tiny-scale Vision-Language-Action model paradigm

Created 5 months ago

1,992 stars

Top 21.7% on SourcePulse

Project Summary

Summary VLA-Adapter offers an efficient paradigm for tiny-scale Vision-Language-Action (VLA) models in robotics. It enables adaptation of large Vision-Language Models (VLMs) for embodied tasks, benefiting researchers and developers with limited computational resources.

How It Works This project implements a lightweight adapter module bridging pre-trained VLMs with robotic control policies. It facilitates efficient fine-tuning on datasets like LIBERO and CALVIN by leveraging VLM backbones (e.g., Prismatic-VLMs Qwen2.5-0.5B) for embodied tasks, prioritizing efficiency and adaptability for resource-constrained environments.

Quick Start & Requirements

Installation: Requires Python 3.10.16, PyTorch 2.2.0, and Flash Attention 2. Setup involves creating a Conda environment, cloning the repo, and running pip install -e . and flash-attn.
Dependencies: Requires separate installation and downloading of LIBERO (approx. 10GB) and CALVIN (approx. 50GB) benchmark datasets and their repositories. A VLM backbone (e.g., Prismatic-VLMs Qwen2.5-0.5B) must be downloaded and placed in /pretrained_models.
Hardware: Training supports GPUs from 10GB VRAM (requiring careful parameter tuning) up to multi-GPU setups (e.g., 4x H100). Inference can be performed on single GPUs.
Links: Paper: https://arxiv.org/pdf/2509.09372, Project Page: https://vla-adapter.github.io/, HuggingFace Models: https://huggingface.co/VLA-Adapter.

Highlighted Details

Achieves state-of-the-art performance on the LIBERO benchmark for tiny-scale VLA models (Pro version: 98.5% avg success).
Demonstrates strong performance on the CALVIN benchmark (Pro version: 80.0% avg success).
Provides detailed training guidance for GPUs with VRAM as low as 10GB, enhancing accessibility.
Offers an original and an enhanced "Pro" version with improved performance and smaller model size.

Maintenance & Community The project recently released its code and paper, with ongoing development planned for enhanced versions (VLA-Adapter++), broader system compatibility, and integration with more foundation models. Community links include Twitter and WeChat.

Licensing & Compatibility The README does not explicitly state a license. Clarification is needed for commercial use or closed-source linking, though the project builds on other open-source works.

Limitations & Caveats The project is under active development with planned improvements. Performance may vary slightly on GPUs other than NVIDIA H100 for inference. No specific known bugs or deprecations are mentioned.

VLA-Adapter by OpenHelix-Team

Explore Similar Projects

vla-scratch by EGalahad

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

Awesome-VLA-Papers by Psi-Robot

RoboVLMs by Robot-VLAs

vla0 by NVlabs

Motus by thu-ml

RoboFlamingo by RoboFlamingo

CogACT by microsoft

SpatialVLA by SpatialVLA

UniVLA by OpenDriveLab

starVLA by starVLA

RLinf by RLinf