EmbodiedBench  by EmbodiedBench

Benchmarking MLLMs as vision-driven embodied agents

Created 11 months ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

EmbodiedBench is a comprehensive benchmark designed to evaluate Multi-modal Large Language Models (MLLMs) as embodied agents. It addresses the limitations of existing benchmarks by offering fine-grained, capability-oriented assessments across both high-level and low-level tasks. This platform is intended for researchers and engineers developing embodied AI systems, providing actionable insights to advance MLLM-driven agents.

How It Works

EmbodiedBench employs a standardized evaluation platform featuring four distinct environments: EB-ALFRED and EB-Habitat for high-level tasks, and EB-Navigation and EB-Manipulation for low-level tasks. It assesses six critical agent capabilities, including commonsense reasoning, complex instruction following, spatial awareness, visual perception, and long-term planning. The system provides unified, Gym-style APIs for seamless integration and supports evaluation of both proprietary models via APIs and open-source models through local execution or model serving frameworks like LMDeploy. Its flexible configuration options allow for in-depth experimentation with various visual and textual inputs, prompts, and environment feedback.

Quick Start & Requirements

Installation requires setting up three separate conda environments for different environment groups. The process involves cloning the repository using Git LFS for large dataset downloads. Key dependencies include specific versions of Habitat-Sim (0.3.0 withbullet headless) and CoppeliaSim V4.1.0 for EB-Manipulation on Ubuntu 20.04. Local execution of large open-source models necessitates sufficient GPU memory, with tensor parallelism (tp) guidance provided. API keys for proprietary models (OpenAI, Gemini, Anthropic, DashScope) are required for their evaluation. Running experiments on headless servers requires starting a headless server script.

Highlighted Details

  • Features 1,128 testing tasks across four diverse environments.
  • Includes six specialized subsets for evaluating core agent capabilities.
  • Offers unified Gym-style APIs for all embodied environments.
  • Supports both remote API calls for proprietary models and local execution for open-source models.
  • Provides configurable textual and visual input designs for detailed experimentation.
  • Introduces a truncate feature for managing long conversation histories in navigation tasks.

Maintenance & Community

The project was accepted to ICML 2025, indicating recent academic relevance. News updates in 2025 highlight the release of training recipes and trajectory datasets. Specific community links (e.g., Discord, Slack) or detailed contributor information beyond the author list are not provided in the README.

Licensing & Compatibility

The provided README does not explicitly state the software license for EmbodiedBench. This omission requires further investigation for compatibility with commercial or closed-source applications.

Limitations & Caveats

Setting up EmbodiedBench involves a complex multi-environment installation process and requires downloading substantial datasets. The installation of CoppeliaSim for EB-Manipulation is noted as specific to Ubuntu 20.04. The README does not specify the project's license, posing a potential adoption blocker. Users are advised against enabling multiple advanced input flags (visual_icl, multiview, multistep, chat_history) simultaneously due to potential conflicts.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.