Discover and explore top open-source AI tools and projects—updated daily.
NVlabsState-of-the-art Vision-Language-Action models via text-based action representation
Top 86.3% on SourcePulse
Summary
VLA-0 offers a novel, simplified approach to building state-of-the-art Vision-Language Action (VLA) models for robot manipulation. It targets researchers and engineers, enabling superior performance on benchmarks and real-world tasks without modifying the base Vision-Language Model (VLM) or requiring extensive robotics pretraining.
How It Works
This project explores representing robot actions directly as text, a largely unexplored strategy. VLA-0 leverages existing VLMs like Qwen2.5-VL-3B without architectural changes or special action tokens, treating actions as text outputs. This "zero modification" approach simplifies VLA development and surprisingly yields superior performance compared to methods that alter VLM vocabularies or introduce action heads, achieving SOTA on LIBERO and real-world tests.
Quick Start & Requirements
git clone --recurse-submodules), create/activate conda env (conda create -n vla0 python=3.10, conda activate vla0), install with extras (PIP_REQ_EXTRAS=qwen,libero pip install --no-build-isolation -e ".[qwen,libero]"). RoboVerse library requires separate install (cd libs/RoboVerse && PIP_REQ_EXTRAS=lerobot pip install --no-build-isolation -e ".[lerobot]" && cd ../..).Highlighted Details
Maintenance & Community
Developed by Ankit Goyal, Hugo Hadfield, Xuning Yang, Valts Bulkis, and Fabio Ramos (NVIDIA). Community contributions are welcomed. Direct contact: ankgoyal@umich.edu. No specific community channels or roadmap links are provided.
Licensing & Compatibility
Code and model released under CC BY-NC 4.0 (non-commercial use). Subject to Qwen Research License for the base model. Commercial adoption requires careful review of both licenses.
Limitations & Caveats
Potential improvements include TensorRT-LLM integration for faster inference (targeting 6 Hz from 4 Hz) and lower precision deployment (e.g., INT8) for speed. Compatibility with newer LeRobot versions is unvalidated, and direct LeRobot integrations could be simplified.
1 week ago
Inactive
octo-models