Discover and explore top open-source AI tools and projects—updated daily.
Unified vision-language-action model
Top 91.6% on SourcePulse
HybridVLA is a unified vision-language-action model that combines diffusion and autoregressive approaches for robotic control. It is designed for researchers and engineers working on embodied AI and robotic manipulation, offering improved generalization and performance across diverse real-world scenarios.
How It Works
HybridVLA innovatively integrates diffusion models for continuous action generation with autoregressive models for discrete reasoning within a single LLM framework. This hybrid approach leverages diffusion's probabilistic nature and autoregression's planning capabilities. The model is pretrained on large-scale, cross-embodied robotic datasets and fine-tuned on simulation and custom real-world data, enabling robust performance and generalization.
Quick Start & Requirements
pip install -e .
(after cloning the repo).Highlighted Details
Maintenance & Community
The project is actively developed by the PKU-HMI-Lab. Recent updates include script and configuration improvements, and open-sourcing of the RLBench environment.
Licensing & Compatibility
Licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
While tested with CUDA 12.0, compatibility with lower CUDA versions is not guaranteed. The RLBench testing setup involves significant external dependencies and simulator installation.
3 months ago
Inactive