Discover and explore top open-source AI tools and projects—updated daily.
eai-yeslabEmbodied AI VLA platform
Top 62.4% on SourcePulse
OpenEAI-VLA provides an open-source, unified hardware-software platform for embodied artificial intelligence, specifically targeting real-world manipulation tasks. It aims to reduce the complexity and cost of developing, reproducing, and scaling embodied AI systems by offering a complete pipeline from robot hardware designs to policy deployment. The platform is beneficial for researchers and engineers seeking to build and deploy sophisticated AI agents capable of interacting with the physical world.
How It Works
The core of OpenEAI-VLA is a two-stage vision-language-action (VLA) policy training recipe. This involves large-scale pretraining on diverse public robot datasets, followed by task-specific fine-tuning using a minimal set of demonstrations, optionally augmented with multimodal data. The system incorporates cross-dataset alignment adapters to harmonize heterogeneous state and action conventions from various data sources. For deployment, it offers a standard robot-client/policy-server interface enabling real-time streaming of observations and action chunk generation.
Quick Start & Requirements
Installation requires Python 3.10 or higher. Recommended setup involves creating a conda environment:
conda create -n openeai python=3.10 -y
conda activate openeai
pip install -r requirements.txt
pip install -e .
Dataset preparation involves placing data in the data/ directory, with processed versions available at OpenEAI/OpenEAI-Dataset (~3.12TB). Several base models are mandatory, including t5-v1_1-xl, siglip-so400m-patch14-384, paligemma-3b-pt-224, Qwen2.5-VL-3B-Instruct, and crucially, Qwen3-VL-4B-Instruct. These models can be downloaded from Hugging Face. Data postprocessing scripts are provided in data_utils/ to convert raw datasets to HDF5 format.
Highlighted Details
Maintenance & Community
Contributions are managed via GitHub issues and pull requests. For inquiries, users can contact ynylincoln@sjtu.edu.cn. Specific community channels like Discord or Slack are not detailed in the README.
Licensing & Compatibility
The project is licensed under the BSD-3-Clause License, which generally permits commercial use and modification with attribution.
Limitations & Caveats
Certain resources, including CAD packages, manufacturing files, large-scale configurations, and pretrained checkpoints, are slated for future release due to their substantial upload volume. Multi-node, multi-GPU training necessitates a high-speed inter-node communication setup, preferably InfiniBand (IB), for optimal performance.
2 months ago
Inactive