Hybrid-VLA by PKU-HMI-Lab

Unified vision-language-action model

Created 10 months ago

334 stars

Top 82.2% on SourcePulse

Project Summary

HybridVLA is a unified vision-language-action model that combines diffusion and autoregressive approaches for robotic control. It is designed for researchers and engineers working on embodied AI and robotic manipulation, offering improved generalization and performance across diverse real-world scenarios.

How It Works

HybridVLA innovatively integrates diffusion models for continuous action generation with autoregressive models for discrete reasoning within a single LLM framework. This hybrid approach leverages diffusion's probabilistic nature and autoregression's planning capabilities. The model is pretrained on large-scale, cross-embodied robotic datasets and fine-tuned on simulation and custom real-world data, enabling robust performance and generalization.

Quick Start & Requirements

Installation: pip install -e . (after cloning the repo).
Prerequisites: Python >= 3.10, PyTorch >= 2.2.0, CUDA >= 12.0. For training, Flash-Attention 2 is required.
Setup: A conda environment setup is recommended. Testing in RLBench requires additional dependencies and CoppeliaSim installation.
Links: Project Page, Paper, Demo

Highlighted Details

Achieves strong generalization to unseen objects, backgrounds, positions, and lighting.
Pretrained checkpoint on a large-scale robotic dataset is available.
Supports fine-tuning on custom datasets like Bridge V2 with provided configuration adjustments.
Inference example demonstrates predicting actions using both diffusion and autoregressive modes.

Maintenance & Community

The project is actively developed by the PKU-HMI-Lab. Recent updates include script and configuration improvements, and open-sourcing of the RLBench environment.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While tested with CUDA 12.0, compatibility with lower CUDA versions is not guaranteed. The RLBench testing setup involves significant external dependencies and simulator installation.

Hybrid-VLA by PKU-HMI-Lab

Explore Similar Projects

GR-1 by bytedance

vla0 by NVlabs

RoboBrain by FlagOpen

RDT2 by thu-ml

molmoact by allenai

RoboFlamingo by RoboFlamingo

CogACT by microsoft

OpenDriveVLA by DriveVLA

RynnVLA-002 by alibaba-damo-academy

cliport by cliport

octo by octo-models

RLBench by stepjam