lingbot-vla by Robbyant

Pragmatic Vision-Language-Action model for robotics

Created 2 weeks ago

New!

779 stars

Top 45.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Phil Wang

Prolific Research Paper Implementer

Project Summary

Summary

LingBot-VLA is a pragmatic Vision-Language-Action (VLA) foundation model for robotics, enabling robots to interpret and act on visual and linguistic commands. It leverages extensive real-world data to deliver superior performance and efficient training, targeting researchers and engineers in embodied AI.

How It Works

This VLA model is built on 20,000 hours of real-world data from nine dual-arm robot configurations, ensuring robust pre-training and strong benchmark performance. Its pragmatic approach optimizes training efficiency, achieving 1.5x-2.8x speedups over existing VLA codebases. LingBot-VLA is offered in both depth-free and depth-distilled variants for enhanced spatial awareness.

Quick Start & Requirements

Requires Python 3.12.3, PyTorch 2.8.0, and CUDA 12.8. Installation involves cloning and installing lerobot (specific commit 0cf864870cf29f4738d3ade893e6fd13fbd7cdb5) and lingbot-vla repositories, along with flash_attn and other dependencies. Pre-trained weights for LingBot-VLA, Qwen2.5-VL-3B-Instruct, MoGe, and LingBot-Depth must be downloaded separately.

lingbot-vla by Robbyant

Explore Similar Projects

Embodied-AI-Paper-TopConf by Songwxuan

Being-H by BeingBeyond

vla0 by NVlabs

embodied-agents by mbodiai

CogACT by microsoft

SpatialVLA by SpatialVLA

opendr by opendr-eu

Awesome-Robotics-Foundation-Models by robotics-survey

3D-Diffusion-Policy by YanjieZe

RoboticsDiffusionTransformer by thu-ml

octo by octo-models

openpi by Physical-Intelligence