OpenHelix  by OpenHelix-Team

Robotic manipulation with a dual-system VLA model

Created 6 months ago
296 stars

Top 89.4% on SourcePulse

GitHubView on GitHub
Project Summary

OpenHelix is an open-source implementation of a dual-system Visual-Language-Action (VLA) model for robotic manipulation. It addresses the challenge of enabling robots to understand and execute complex tasks based on natural language instructions and visual input. The project targets researchers and developers in robotics and AI, offering a State-of-the-Art (SOTA) model that achieves high performance on benchmarks like CALVIN ABC-D.

How It Works

OpenHelix utilizes a dual-system approach, combining a Multimodal Large Language Model (MLLM) with a diffusion-based policy model. The MLLM processes language instructions and visual information, while the diffusion policy generates robot actions. This architecture allows for robust task understanding and precise action generation, with the "prompt tuning" method for MLLM training and auxiliary tasks enhancing performance, especially in scenarios with longer execution sequences (EP_LEN=360) and delayed actions (Asy(10)).

Quick Start & Requirements

  • Installation: Requires creating a conda environment with Python 3.8, installing CALVIN locally, cloning the OpenHelix repository, and installing dependencies including diffusers, dgl, and flash-attn.
  • Prerequisites: CUDA 11.8 is recommended for flash-attn.
  • Data Preparation: Involves downloading CALVIN play demonstrations and packaging them for training. Language instructions can be encoded using a CLIP Text Encoder or downloaded pre-encoded.
  • Checkpoints: Model weights are available on Hugging Face. Merging safetensors shards into a single pytorch_model.bin is necessary.
  • Links: arXiv Paper, Hugging Face Checkpoints, CALVIN Repo, DGL, FlashAttention.

Highlighted Details

  • Achieves SOTA performance among dual-system VLA models on CALVIN ABC-D with EP_LEN=360.
  • Offers two configurations: MLLM (PT) + Policy(P) and MLLM (PT) + Aux + Policy(P).
  • Provides scripts for training and evaluation.
  • Includes a detailed bibtex entry for citation.

Maintenance & Community

The OpenHelix team commits to long-term maintenance. Contact is available via email. The project was initially released in April 2025, with the paper and checkpoints following in May 2025.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

The project is in its initial release phase, with plans for further model updates and deployment on real and humanoid robots, including inter-robot collaboration. The current evaluation is focused on the CALVIN dataset.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
31 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.