vlash  by mit-han-lab

Real-time Vision-Language Agent deployment and fine-tuning

Created 4 months ago
362 stars

Top 77.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary VLASH provides an efficient, easy-to-use framework for deploying Vision-Language Agents (VLAs) in real-time, focusing on fast reaction and smooth motion. It targets researchers and engineers needing performant VLA capabilities for robotics and AI, offering optimized inference and simplified fine-tuning on consumer hardware.

How It Works The core approach utilizes asynchronous inference combined with future-state awareness to achieve high reaction speeds and stable operation without overhead. Action quantization further accelerates robot execution. For efficient adaptation, VLASH integrates LoRA with shared observation encoding, enabling fine-tuning on consumer GPUs.

Quick Start & Requirements Setup requires Python 3.10 within a Conda environment, ffmpeg 7.1.1 (via conda-forge), and pip install -e .. It integrates with LeRobot datasets, models, and robots, using YAML for configuration.

Highlighted Details

  • Achieves >30Hz inference frequency for $\pi_{0.5}$ on RTX 5090.
  • Supports LoRA fine-tuning for $\pi_{0.5}$, $\pi_0$ under 12GB GPU memory.
  • Features action quantization for faster robot execution and asynchronous inference for stable, low-overhead operation.
  • Seamless integration with LeRobot datasets (v2.1, v3.0) and various policy architectures.

Maintenance & Community Built upon LeRobot and PEFT. No specific community channels or roadmap links are detailed in the README.

Licensing & Compatibility Released under the Apache 2.0 license, permitting commercial use and modification with standard attribution.

Limitations & Caveats QLoRA fine-tuning for policies under 8GB GPU memory is listed as a future development item (TODO). Optimization for lower-end GPUs remains a focus.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
24 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.