EO1  by EO-Robotics

Unified embodied foundation model for robot control and multimodal reasoning

Created 1 month ago
257 stars

Top 98.4% on SourcePulse

GitHubView on GitHub
Project Summary

EO-1 is an open-source unified embodied foundation model series designed for general robot control. It addresses the challenge of integrating perception, planning, reasoning, and action into a single, coherent system, targeting researchers and engineers in robotics and embodied AI. EO-1 offers a unified architecture that enables seamless multimodal reasoning and advanced robot control, simplifying complex robotic tasks.

How It Works

The core of EO-1 is a 3B parameter, decoder-only transformer architecture. It employs interleaved pretraining, combining discrete auto-regressive decoding with continuous flow matching denoising. This approach allows for a synergistic integration of language, vision, and action modalities. Trained on a diverse dataset including EO-Data1.5M, web multimodal data, and various robot control datasets, EO-1 achieves reasoning-enhanced generalization for robust performance in real-world robot control scenarios.

Quick Start & Requirements

Installation involves cloning the repository, creating a Python 3.10 conda environment, and installing dependencies including flash-attn. For optimal performance, NVIDIA H100/H800 GPUs with CUDA 12.8 are recommended for building flash-attn from source. Inference requires approximately 6.5GB of GPU memory. The project leverages HuggingFace Transformers and Lerobot for straightforward deployment and integration.

Highlighted Details

  • Unified Architecture: A single decoder-only transformer handles text, image, video, and action modalities.
  • EO-1.5M Dataset: Utilizes a 1.5M sample dataset curated for interleaved physical, reasoning, and control tasks.
  • Interleaved Pretraining: Combines autoregressive and flow matching techniques for enhanced language-action synergy.
  • Performance: Demonstrates strong results on various robot control benchmarks (LIBERO, Simpler) and multimodal reasoning tasks (EO-Bench, RoboVQA), often outperforming larger models.

Maintenance & Community

The project is actively maintained, with contributions welcomed and a community Discord server available. Key integrations, such as merging into the LERobot main branch, are complete. Future roadmap items include releasing pre-training models, the EO-Data1.5M dataset, and the EO-Bench benchmark.

Licensing & Compatibility

The provided README does not specify a software license, which may impact commercial use or integration into closed-source projects.

Limitations & Caveats

Key components like pre-training models and the full EO-Data1.5M dataset are not yet publicly released according to the roadmap. Optimal performance for specific dependencies like flash-attn requires high-end NVIDIA hardware and a specific CUDA version. The absence of a stated license is a significant caveat for adoption.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
10
Star History
146 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.