EO1 by EO-Robotics

Unified embodied foundation model for robot control and multimodal reasoning

Created 3 months ago

273 stars

Top 94.6% on SourcePulse

Project Summary

EO-1 is an open-source unified embodied foundation model series designed for general robot control. It addresses the challenge of integrating perception, planning, reasoning, and action into a single, coherent system, targeting researchers and engineers in robotics and embodied AI. EO-1 offers a unified architecture that enables seamless multimodal reasoning and advanced robot control, simplifying complex robotic tasks.

How It Works

The core of EO-1 is a 3B parameter, decoder-only transformer architecture. It employs interleaved pretraining, combining discrete auto-regressive decoding with continuous flow matching denoising. This approach allows for a synergistic integration of language, vision, and action modalities. Trained on a diverse dataset including EO-Data1.5M, web multimodal data, and various robot control datasets, EO-1 achieves reasoning-enhanced generalization for robust performance in real-world robot control scenarios.

Quick Start & Requirements

Installation involves cloning the repository, creating a Python 3.10 conda environment, and installing dependencies including flash-attn. For optimal performance, NVIDIA H100/H800 GPUs with CUDA 12.8 are recommended for building flash-attn from source. Inference requires approximately 6.5GB of GPU memory. The project leverages HuggingFace Transformers and Lerobot for straightforward deployment and integration.

Highlighted Details

Unified Architecture: A single decoder-only transformer handles text, image, video, and action modalities.
EO-1.5M Dataset: Utilizes a 1.5M sample dataset curated for interleaved physical, reasoning, and control tasks.
Interleaved Pretraining: Combines autoregressive and flow matching techniques for enhanced language-action synergy.
Performance: Demonstrates strong results on various robot control benchmarks (LIBERO, Simpler) and multimodal reasoning tasks (EO-Bench, RoboVQA), often outperforming larger models.

Maintenance & Community

The project is actively maintained, with contributions welcomed and a community Discord server available. Key integrations, such as merging into the LERobot main branch, are complete. Future roadmap items include releasing pre-training models, the EO-Data1.5M dataset, and the EO-Bench benchmark.

Licensing & Compatibility

The provided README does not specify a software license, which may impact commercial use or integration into closed-source projects.

Limitations & Caveats

Key components like pre-training models and the full EO-Data1.5M dataset are not yet publicly released according to the roadmap. Optimal performance for specific dependencies like flash-attn requires high-end NVIDIA hardware and a specific CUDA version. The absence of a stated license is a significant caveat for adoption.

EO1 by EO-Robotics

Explore Similar Projects

vla0 by NVlabs

Awesome-VLA-Papers by Psi-Robot

Hybrid-VLA by PKU-HMI-Lab

X-VLA by 2toinf

RoboBrain by FlagOpen

Awesome-Embodied-AI by haoranD

RDT2 by thu-ml

molmoact by allenai

SpatialVLA by SpatialVLA

cosmos-reason1 by nvidia-cosmos

octo by octo-models

Isaac-GR00T by NVIDIA