embodied-agents  by mbodiai

Integrate SOTA AI models into robotics

Created 1 year ago
255 stars

Top 98.8% on SourcePulse

GitHubView on GitHub
Project Summary

This toolkit simplifies integrating state-of-the-art transformer models into robotics stacks, addressing the high barrier to entry for running complex AI on robotic systems. It targets researchers, hobbyists, and developers, offering a modular, extensible, and efficient framework for building responsive robotic agents and facilitating data collection.

How It Works

The project is structured around three core components: Agents, Data, and Hardware, further categorized by meta-modalities: Language, Motion, and Sense. Agents (LanguageAgent, MotorAgent, SensoryAgent, AutoAgent) expose an act method for local or remote inference, supporting diverse backends like OpenAI, Gemini, and OpenVLA. The Sample class provides robust data serialization across formats like Gym spaces, JSON, NumPy, and PyTorch, enabling seamless integration with various ML models and robotics frameworks.

Quick Start & Requirements

Installation is straightforward via pip: pip install mbodied. Additional dependencies for features like audio support can be installed with pip install mbodied[extras]. The project requires Python and common ML libraries such as PyTorch and OpenCV. Official documentation is available at docs, with numerous example scripts and Colab notebooks provided for quick integration.

Highlighted Details

  • Native integration with leading AI models including Google Gemini, OpenAI, and OpenVLA for motion control.
  • Supports advanced sensory processing via agents for depth estimation, object detection (YOLO), and image segmentation (SAM2).
  • Enables asynchronous and remote agent execution, facilitating real-time responsiveness and scalability.
  • Features automatic dataset recording directly on robots, with optional uploads to HuggingFace Hub.
  • Incorporates Retrieval-Augmented Generation (RAG) capabilities for enhanced agent reasoning.

Maintenance & Community

Recent updates (April 2025) include Gemini backend support, tool calling, and RAG functionalities. The project maintains an active community presence via Discord: https://discord.gg/BPQ7FEGxNb. Users are encouraged to star the GitHub repository.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license, which permits commercial use and integration into closed-source projects.

Limitations & Caveats

The framework is experimental and under active development, with potential for breaking changes. It currently does not support learning from in-context experience, and RAG implementations for embodied applications are described as rudimentary. Fine-tuning requires prohibitively large datasets, and online Reinforcement Learning is not yet practical for most use cases.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.