UniVLA is a unified vision-language-action framework designed for learning generalist robotic policies across diverse environments and embodiments. It targets researchers and engineers in robotics and AI who aim to develop adaptable and efficient control systems, offering significant improvements over previous methods like OpenVLA.
How It Works
UniVLA introduces task-centric latent actions, derived unsupervisedly via a VQ-VAE, to create an embodiment-agnostic action space. This approach allows the model to leverage data from various sources without requiring explicit action labels. A generalist policy is then pretrained on this latent action space, followed by lightweight, embodiment-specific action decoders for deployment, enabling efficient fine-tuning and adaptation.
Quick Start & Requirements
- Install: Clone the repository and install dependencies using
pip install -e .
.
- Prerequisites: Python 3.10, PyTorch 2.2.0 with CUDA 12.1, Flash Attention 2.
- Setup: Conda environment setup and PyTorch installation are recommended.
- Docs: Paper, Demo Page (Coming Soon).
Highlighted Details
- Achieves state-of-the-art performance on LIBERO benchmarks, outperforming models like Diffusion Policy, Octo, OpenVLA, and TraceVLA.
- Demonstrates significant computational efficiency, requiring only 5% of the resources used by OpenVLA for full-scale pretraining.
- Offers cost-efficient pre-training options for specific datasets (e.g., BridgeV2, Ego4D human videos).
- Supports real-world deployment with lightweight action decoders (approx. 12M parameters) and parameter-efficient fine-tuning (LoRA).
Maintenance & Community
- Official implementation of a paper published in RSS 2025.
- Primary contact: Qingwen Bu (buqingwen@opendrivelab.com).
- Code released in May 2025.
Licensing & Compatibility
- The repository is released under the MIT License, permitting commercial use and closed-source linking.
Limitations & Caveats
- Demo page and some specific fine-tuning scripts (e.g., Room2Room, CALVIN, SimplerEnv) are marked as "Coming Soon" or "TODO".
- Real-world deployment guidelines are based on the AgiLex platform, requiring adaptation for other systems.