VLA-Adapter  by OpenHelix-Team

Tiny-scale Vision-Language-Action model paradigm

Created 3 months ago
1,890 stars

Top 22.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary VLA-Adapter offers an efficient paradigm for tiny-scale Vision-Language-Action (VLA) models in robotics. It enables adaptation of large Vision-Language Models (VLMs) for embodied tasks, benefiting researchers and developers with limited computational resources.

How It Works This project implements a lightweight adapter module bridging pre-trained VLMs with robotic control policies. It facilitates efficient fine-tuning on datasets like LIBERO and CALVIN by leveraging VLM backbones (e.g., Prismatic-VLMs Qwen2.5-0.5B) for embodied tasks, prioritizing efficiency and adaptability for resource-constrained environments.

Quick Start & Requirements

  • Installation: Requires Python 3.10.16, PyTorch 2.2.0, and Flash Attention 2. Setup involves creating a Conda environment, cloning the repo, and running pip install -e . and flash-attn.
  • Dependencies: Requires separate installation and downloading of LIBERO (approx. 10GB) and CALVIN (approx. 50GB) benchmark datasets and their repositories. A VLM backbone (e.g., Prismatic-VLMs Qwen2.5-0.5B) must be downloaded and placed in /pretrained_models.
  • Hardware: Training supports GPUs from 10GB VRAM (requiring careful parameter tuning) up to multi-GPU setups (e.g., 4x H100). Inference can be performed on single GPUs.
  • Links: Paper: https://arxiv.org/pdf/2509.09372, Project Page: https://vla-adapter.github.io/, HuggingFace Models: https://huggingface.co/VLA-Adapter.

Highlighted Details

  • Achieves state-of-the-art performance on the LIBERO benchmark for tiny-scale VLA models (Pro version: 98.5% avg success).
  • Demonstrates strong performance on the CALVIN benchmark (Pro version: 80.0% avg success).
  • Provides detailed training guidance for GPUs with VRAM as low as 10GB, enhancing accessibility.
  • Offers an original and an enhanced "Pro" version with improved performance and smaller model size.

Maintenance & Community The project recently released its code and paper, with ongoing development planned for enhanced versions (VLA-Adapter++), broader system compatibility, and integration with more foundation models. Community links include Twitter and WeChat.

Licensing & Compatibility The README does not explicitly state a license. Clarification is needed for commercial use or closed-source linking, though the project builds on other open-source works.

Limitations & Caveats The project is under active development with planned improvements. Performance may vary slightly on GPUs other than NVIDIA H100 for inference. No specific known bugs or deprecations are mentioned.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
7
Star History
109 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.