vlash by mit-han-lab

Real-time Vision-Language Agent deployment and fine-tuning

Created 7 months ago

434 stars

Top 67.9% on SourcePulse

Project Summary

Summary VLASH provides an efficient, easy-to-use framework for deploying Vision-Language Agents (VLAs) in real-time, focusing on fast reaction and smooth motion. It targets researchers and engineers needing performant VLA capabilities for robotics and AI, offering optimized inference and simplified fine-tuning on consumer hardware.

How It Works The core approach utilizes asynchronous inference combined with future-state awareness to achieve high reaction speeds and stable operation without overhead. Action quantization further accelerates robot execution. For efficient adaptation, VLASH integrates LoRA with shared observation encoding, enabling fine-tuning on consumer GPUs.

Quick Start & Requirements Setup requires Python 3.10 within a Conda environment, ffmpeg 7.1.1 (via conda-forge), and pip install -e .. It integrates with LeRobot datasets, models, and robots, using YAML for configuration.

Highlighted Details

Achieves >30Hz inference frequency for $\pi_{0.5}$ on RTX 5090.
Supports LoRA fine-tuning for $\pi_{0.5}$, $\pi_0$ under 12GB GPU memory.
Features action quantization for faster robot execution and asynchronous inference for stable, low-overhead operation.
Seamless integration with LeRobot datasets (v2.1, v3.0) and various policy architectures.

Maintenance & Community Built upon LeRobot and PEFT. No specific community channels or roadmap links are detailed in the README.

Licensing & Compatibility Released under the Apache 2.0 license, permitting commercial use and modification with standard attribution.

Limitations & Caveats QLoRA fine-tuning for policies under 8GB GPU memory is listed as a future development item (TODO). Optimization for lower-end GPUs remains a focus.

vlash by mit-han-lab

Explore Similar Projects

VisionZip by JIA-Lab-research

CogACT by microsoft

realtime-vla by dexmal

GalaxeaVLA by OpenGalaxea

SpatialVLA by SpatialVLA

FluxVLA by FluxVLA

Kimi-VL by MoonshotAI

JoyAI-VL-Interaction by jd-opensource

RyzenAI-SW by amd

VLA-Adapter by OpenHelix-Team

Qwen3.6 by QwenLM

openpi by Physical-Intelligence