Discover and explore top open-source AI tools and projects—updated daily.
Robotic manipulation with a dual-system VLA model
Top 89.4% on SourcePulse
OpenHelix is an open-source implementation of a dual-system Visual-Language-Action (VLA) model for robotic manipulation. It addresses the challenge of enabling robots to understand and execute complex tasks based on natural language instructions and visual input. The project targets researchers and developers in robotics and AI, offering a State-of-the-Art (SOTA) model that achieves high performance on benchmarks like CALVIN ABC-D.
How It Works
OpenHelix utilizes a dual-system approach, combining a Multimodal Large Language Model (MLLM) with a diffusion-based policy model. The MLLM processes language instructions and visual information, while the diffusion policy generates robot actions. This architecture allows for robust task understanding and precise action generation, with the "prompt tuning" method for MLLM training and auxiliary tasks enhancing performance, especially in scenarios with longer execution sequences (EP_LEN=360) and delayed actions (Asy(10)).
Quick Start & Requirements
diffusers
, dgl
, and flash-attn
.flash-attn
.safetensors
shards into a single pytorch_model.bin
is necessary.Highlighted Details
Maintenance & Community
The OpenHelix team commits to long-term maintenance. Contact is available via email. The project was initially released in April 2025, with the paper and checkpoints following in May 2025.
Licensing & Compatibility
The project is licensed under the MIT License, permitting commercial use and closed-source linking.
Limitations & Caveats
The project is in its initial release phase, with plans for further model updates and deployment on real and humanoid robots, including inter-robot collaboration. The current evaluation is focused on the CALVIN dataset.
1 month ago
Inactive