embodied-agents by mbodiai

Integrate SOTA AI models into robotics

Created 1 year ago

271 stars

Top 95.1% on SourcePulse

Project Summary

This toolkit simplifies integrating state-of-the-art transformer models into robotics stacks, addressing the high barrier to entry for running complex AI on robotic systems. It targets researchers, hobbyists, and developers, offering a modular, extensible, and efficient framework for building responsive robotic agents and facilitating data collection.

How It Works

The project is structured around three core components: Agents, Data, and Hardware, further categorized by meta-modalities: Language, Motion, and Sense. Agents (LanguageAgent, MotorAgent, SensoryAgent, AutoAgent) expose an act method for local or remote inference, supporting diverse backends like OpenAI, Gemini, and OpenVLA. The Sample class provides robust data serialization across formats like Gym spaces, JSON, NumPy, and PyTorch, enabling seamless integration with various ML models and robotics frameworks.

Quick Start & Requirements

Installation is straightforward via pip: pip install mbodied. Additional dependencies for features like audio support can be installed with pip install mbodied[extras]. The project requires Python and common ML libraries such as PyTorch and OpenCV. Official documentation is available at docs, with numerous example scripts and Colab notebooks provided for quick integration.

Highlighted Details

Native integration with leading AI models including Google Gemini, OpenAI, and OpenVLA for motion control.
Supports advanced sensory processing via agents for depth estimation, object detection (YOLO), and image segmentation (SAM2).
Enables asynchronous and remote agent execution, facilitating real-time responsiveness and scalability.
Features automatic dataset recording directly on robots, with optional uploads to HuggingFace Hub.
Incorporates Retrieval-Augmented Generation (RAG) capabilities for enhanced agent reasoning.

Maintenance & Community

Recent updates (April 2025) include Gemini backend support, tool calling, and RAG functionalities. The project maintains an active community presence via Discord: https://discord.gg/BPQ7FEGxNb. Users are encouraged to star the GitHub repository.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license, which permits commercial use and integration into closed-source projects.

Limitations & Caveats

The framework is experimental and under active development, with potential for breaking changes. It currently does not support learning from in-context experience, and RAG implementations for embodied applications are described as rudimentary. Fine-tuning requires prohibitively large datasets, and online Reinforcement Learning is not yet practical for most use cases.

embodied-agents by mbodiai

Explore Similar Projects

Embodied-AI-Paper-TopConf by Songwxuan

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

vla0 by NVlabs

GRID-playground by GenRobo

X-VLA by 2toinf

ChatGPT-Robot-Manipulation-Prompts by microsoft

ROS-LLM by Auromix

reachy_mini by pollen-robotics

Awesome-Robotics-Foundation-Models by robotics-survey

rosa by nasa-jpl

RoboticsDiffusionTransformer by thu-ml

openpi by Physical-Intelligence