Hybrid-VLA  by PKU-HMI-Lab

Unified vision-language-action model

Created 6 months ago
286 stars

Top 91.6% on SourcePulse

GitHubView on GitHub
Project Summary

HybridVLA is a unified vision-language-action model that combines diffusion and autoregressive approaches for robotic control. It is designed for researchers and engineers working on embodied AI and robotic manipulation, offering improved generalization and performance across diverse real-world scenarios.

How It Works

HybridVLA innovatively integrates diffusion models for continuous action generation with autoregressive models for discrete reasoning within a single LLM framework. This hybrid approach leverages diffusion's probabilistic nature and autoregression's planning capabilities. The model is pretrained on large-scale, cross-embodied robotic datasets and fine-tuned on simulation and custom real-world data, enabling robust performance and generalization.

Quick Start & Requirements

  • Installation: pip install -e . (after cloning the repo).
  • Prerequisites: Python >= 3.10, PyTorch >= 2.2.0, CUDA >= 12.0. For training, Flash-Attention 2 is required.
  • Setup: A conda environment setup is recommended. Testing in RLBench requires additional dependencies and CoppeliaSim installation.
  • Links: Project Page, Paper, Demo

Highlighted Details

  • Achieves strong generalization to unseen objects, backgrounds, positions, and lighting.
  • Pretrained checkpoint on a large-scale robotic dataset is available.
  • Supports fine-tuning on custom datasets like Bridge V2 with provided configuration adjustments.
  • Inference example demonstrates predicting actions using both diffusion and autoregressive modes.

Maintenance & Community

The project is actively developed by the PKU-HMI-Lab. Recent updates include script and configuration improvements, and open-sourcing of the RLBench environment.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While tested with CUDA 12.0, compatibility with lower CUDA versions is not guaranteed. The RLBench testing setup involves significant external dependencies and simulator installation.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
14 stars in the last 30 days

Explore Similar Projects

Starred by Alberto Taiuti Alberto Taiuti(Cofounder of Luma AI) and Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI).

GR-1 by bytedance

0.7%
279
GPT-style model for visual robot manipulation research
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.