Discover and explore top open-source AI tools and projects—updated daily.
EmbodiedBenchBenchmarking MLLMs as vision-driven embodied agents
Top 99.6% on SourcePulse
EmbodiedBench is a comprehensive benchmark designed to evaluate Multi-modal Large Language Models (MLLMs) as embodied agents. It addresses the limitations of existing benchmarks by offering fine-grained, capability-oriented assessments across both high-level and low-level tasks. This platform is intended for researchers and engineers developing embodied AI systems, providing actionable insights to advance MLLM-driven agents.
How It Works
EmbodiedBench employs a standardized evaluation platform featuring four distinct environments: EB-ALFRED and EB-Habitat for high-level tasks, and EB-Navigation and EB-Manipulation for low-level tasks. It assesses six critical agent capabilities, including commonsense reasoning, complex instruction following, spatial awareness, visual perception, and long-term planning. The system provides unified, Gym-style APIs for seamless integration and supports evaluation of both proprietary models via APIs and open-source models through local execution or model serving frameworks like LMDeploy. Its flexible configuration options allow for in-depth experimentation with various visual and textual inputs, prompts, and environment feedback.
Quick Start & Requirements
Installation requires setting up three separate conda environments for different environment groups. The process involves cloning the repository using Git LFS for large dataset downloads. Key dependencies include specific versions of Habitat-Sim (0.3.0 withbullet headless) and CoppeliaSim V4.1.0 for EB-Manipulation on Ubuntu 20.04. Local execution of large open-source models necessitates sufficient GPU memory, with tensor parallelism (tp) guidance provided. API keys for proprietary models (OpenAI, Gemini, Anthropic, DashScope) are required for their evaluation. Running experiments on headless servers requires starting a headless server script.
Highlighted Details
truncate feature for managing long conversation histories in navigation tasks.Maintenance & Community
The project was accepted to ICML 2025, indicating recent academic relevance. News updates in 2025 highlight the release of training recipes and trajectory datasets. Specific community links (e.g., Discord, Slack) or detailed contributor information beyond the author list are not provided in the README.
Licensing & Compatibility
The provided README does not explicitly state the software license for EmbodiedBench. This omission requires further investigation for compatibility with commercial or closed-source applications.
Limitations & Caveats
Setting up EmbodiedBench involves a complex multi-environment installation process and requires downloading substantial datasets. The installation of CoppeliaSim for EB-Manipulation is noted as specific to Ubuntu 20.04. The README does not specify the project's license, posing a potential adoption blocker. Users are advised against enabling multiple advanced input flags (visual_icl, multiview, multistep, chat_history) simultaneously due to potential conflicts.
2 months ago
Inactive
hkust-nlp
microsoft