lingbot-vla  by Robbyant

Pragmatic Vision-Language-Action model for robotics

Created 2 weeks ago

New!

779 stars

Top 45.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

LingBot-VLA is a pragmatic Vision-Language-Action (VLA) foundation model for robotics, enabling robots to interpret and act on visual and linguistic commands. It leverages extensive real-world data to deliver superior performance and efficient training, targeting researchers and engineers in embodied AI.

How It Works

This VLA model is built on 20,000 hours of real-world data from nine dual-arm robot configurations, ensuring robust pre-training and strong benchmark performance. Its pragmatic approach optimizes training efficiency, achieving 1.5x-2.8x speedups over existing VLA codebases. LingBot-VLA is offered in both depth-free and depth-distilled variants for enhanced spatial awareness.

Quick Start & Requirements

Requires Python 3.12.3, PyTorch 2.8.0, and CUDA 12.8. Installation involves cloning and installing lerobot (specific commit 0cf864870cf29f4738d3ade893e6fd13fbd7cdb5) and lingbot-vla repositories, along with flash_attn and other dependencies. Pre-trained weights for LingBot-VLA, Qwen2.5-VL-3B-Instruct, MoGe, and LingBot-Depth must be downloaded separately.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
18
Star History
781 stars in the last 17 days

Explore Similar Projects

Feedback? Help us improve.