X-VLA  by 2toinf

Robotic control model using soft-prompted Transformers for cross-embodiment generalization

Created 2 months ago
281 stars

Top 92.9% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> X-VLA addresses the challenge of creating scalable, generalizable Vision-Language-Action (VLA) models for robotic control across diverse embodiments. It offers a unified Transformer architecture with soft prompts, enabling robust deployment in both simulation and real-world systems. This approach benefits researchers and engineers by providing a high-performance, adaptable VLA solution for heterogeneous robotic platforms.

How It Works

The core of X-VLA is a unified Transformer backbone augmented with embodiment-specific soft prompts—learnable embeddings that guide multi-domain policy learning. This design decouples the general policy model from embodiment-specific details, facilitating cross-embodiment generalization. A Server-Client architecture further enhances deployment flexibility, separating the model from environment dependencies and supporting distributed inference. This approach achieves state-of-the-art performance across various robotic platforms.

Quick Start & Requirements

Installation involves cloning the repository, creating a Python 3.10 Conda environment, and installing dependencies via pip install -r requirements.txt. The project supports inference via a Server-Client architecture, with pre-trained models available on Hugging Face. Links to the paper, project page, and Hugging Face models are provided.

Highlighted Details

  • Championed the AgiBot World Challenge at IROS 2025.
  • Demonstrates state-of-the-art generalization across six simulation and three real-world robotic platforms.
  • Employs a standardized EE6D (End-Effector 6D) control space for consistent action representation.
  • Offers LoRA fine-tuning capabilities with released checkpoints and inference code.

Maintenance & Community

The project is maintained by 2toINF. Feedback, issues, and contributions are welcomed via GitHub Discussions and Pull Requests. Specific community channels like Discord or Slack are not detailed in the README.

Licensing & Compatibility

X-VLA is licensed under the Apache License 2.0, permitting free use, modification, and distribution, including for commercial purposes.

Limitations & Caveats

A slight performance drop (around 1%) was noted after converting models to Hugging Face format, which is under investigation. Guidance for converting relative to absolute actions requires consulting specific GitHub issues. Evaluation guidance for the VLABench is pending updates.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
28
Star History
138 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.