OpenHelix by OpenHelix-Team

Robotic manipulation with a dual-system VLA model

Created 7 months ago

327 stars

Top 83.3% on SourcePulse

Project Summary

OpenHelix is an open-source implementation of a dual-system Visual-Language-Action (VLA) model for robotic manipulation. It addresses the challenge of enabling robots to understand and execute complex tasks based on natural language instructions and visual input. The project targets researchers and developers in robotics and AI, offering a State-of-the-Art (SOTA) model that achieves high performance on benchmarks like CALVIN ABC-D.

How It Works

OpenHelix utilizes a dual-system approach, combining a Multimodal Large Language Model (MLLM) with a diffusion-based policy model. The MLLM processes language instructions and visual information, while the diffusion policy generates robot actions. This architecture allows for robust task understanding and precise action generation, with the "prompt tuning" method for MLLM training and auxiliary tasks enhancing performance, especially in scenarios with longer execution sequences (EP_LEN=360) and delayed actions (Asy(10)).

Quick Start & Requirements

Installation: Requires creating a conda environment with Python 3.8, installing CALVIN locally, cloning the OpenHelix repository, and installing dependencies including diffusers, dgl, and flash-attn.
Prerequisites: CUDA 11.8 is recommended for flash-attn.
Data Preparation: Involves downloading CALVIN play demonstrations and packaging them for training. Language instructions can be encoded using a CLIP Text Encoder or downloaded pre-encoded.
Checkpoints: Model weights are available on Hugging Face. Merging safetensors shards into a single pytorch_model.bin is necessary.
Links: arXiv Paper, Hugging Face Checkpoints, CALVIN Repo, DGL, FlashAttention.

Highlighted Details

Achieves SOTA performance among dual-system VLA models on CALVIN ABC-D with EP_LEN=360.
Offers two configurations: MLLM (PT) + Policy(P) and MLLM (PT) + Aux + Policy(P).
Provides scripts for training and evaluation.
Includes a detailed bibtex entry for citation.

Maintenance & Community

The OpenHelix team commits to long-term maintenance. Contact is available via email. The project was initially released in April 2025, with the paper and checkpoints following in May 2025.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

The project is in its initial release phase, with plans for further model updates and deployment on real and humanoid robots, including inter-robot collaboration. The current evaluation is focused on the CALVIN dataset.

OpenHelix by OpenHelix-Team

Explore Similar Projects

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

vla0 by NVlabs

GR-1 by bytedance

X-VLA by 2toinf

RoboBrain by FlagOpen

unified_video_action by ShuangLI59

RDT2 by thu-ml

CogACT by microsoft

RoboFlamingo by RoboFlamingo

ReKep by huangwl18

RoboTwin by RoboTwin-Platform

octo by octo-models