OpenEAI-VLA by eai-yeslab

Embodied AI VLA platform

Created 5 months ago

480 stars

Top 63.1% on SourcePulse

Project Summary

OpenEAI-VLA provides an open-source, unified hardware-software platform for embodied artificial intelligence, specifically targeting real-world manipulation tasks. It aims to reduce the complexity and cost of developing, reproducing, and scaling embodied AI systems by offering a complete pipeline from robot hardware designs to policy deployment. The platform is beneficial for researchers and engineers seeking to build and deploy sophisticated AI agents capable of interacting with the physical world.

How It Works

The core of OpenEAI-VLA is a two-stage vision-language-action (VLA) policy training recipe. This involves large-scale pretraining on diverse public robot datasets, followed by task-specific fine-tuning using a minimal set of demonstrations, optionally augmented with multimodal data. The system incorporates cross-dataset alignment adapters to harmonize heterogeneous state and action conventions from various data sources. For deployment, it offers a standard robot-client/policy-server interface enabling real-time streaming of observations and action chunk generation.

Quick Start & Requirements

Installation requires Python 3.10 or higher. Recommended setup involves creating a conda environment:

conda create -n openeai python=3.10 -y
conda activate openeai
pip install -r requirements.txt
pip install -e .

Dataset preparation involves placing data in the data/ directory, with processed versions available at OpenEAI/OpenEAI-Dataset (~3.12TB). Several base models are mandatory, including t5-v1_1-xl, siglip-so400m-patch14-384, paligemma-3b-pt-224, Qwen2.5-VL-3B-Instruct, and crucially, Qwen3-VL-4B-Instruct. These models can be downloaded from Hugging Face. Data postprocessing scripts are provided in data_utils/ to convert raw datasets to HDF5 format.

Highlighted Details

Fully open-source end-to-end pipeline covering hardware designs, control stack, dataset processing, training, and deployment.
Reproducible two-stage training methodology: large-scale pretraining followed by few-shot, task-specific fine-tuning.
Cross-dataset alignment adapters for unifying heterogeneous state/action conventions.
Real-time deployment capabilities via a client/server interface for streaming observations and returning action chunks.

Maintenance & Community

Contributions are managed via GitHub issues and pull requests. For inquiries, users can contact ynylincoln@sjtu.edu.cn. Specific community channels like Discord or Slack are not detailed in the README.

Licensing & Compatibility

The project is licensed under the BSD-3-Clause License, which generally permits commercial use and modification with attribution.

Limitations & Caveats

Certain resources, including CAD packages, manufacturing files, large-scale configurations, and pretrained checkpoints, are slated for future release due to their substantial upload volume. Multi-node, multi-GPU training necessitates a high-speed inter-node communication setup, preferably InfiniBand (IB), for optimal performance.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days