Discover and explore top open-source AI tools and projects—updated daily.
RobbyantPragmatic Vision-Language-Action model for robotics
New!
Top 45.0% on SourcePulse
Summary
LingBot-VLA is a pragmatic Vision-Language-Action (VLA) foundation model for robotics, enabling robots to interpret and act on visual and linguistic commands. It leverages extensive real-world data to deliver superior performance and efficient training, targeting researchers and engineers in embodied AI.
How It Works
This VLA model is built on 20,000 hours of real-world data from nine dual-arm robot configurations, ensuring robust pre-training and strong benchmark performance. Its pragmatic approach optimizes training efficiency, achieving 1.5x-2.8x speedups over existing VLA codebases. LingBot-VLA is offered in both depth-free and depth-distilled variants for enhanced spatial awareness.
Quick Start & Requirements
Requires Python 3.12.3, PyTorch 2.8.0, and CUDA 12.8. Installation involves cloning and installing lerobot (specific commit 0cf864870cf29f4738d3ade893e6fd13fbd7cdb5) and lingbot-vla repositories, along with flash_attn and other dependencies. Pre-trained weights for LingBot-VLA, Qwen2.5-VL-3B-Instruct, MoGe, and LingBot-Depth must be downloaded separately.
1 week ago
Inactive
octo-models
Physical-Intelligence