open-eqa by facebookresearch

Benchmark dataset for embodied question answering (EQA) research

Created 1 year ago

336 stars

Top 81.9% on SourcePulse

Project Summary

OpenEQA introduces a new formulation for Embodied Question Answering (EQA), enabling agents to answer questions about environments by leveraging episodic memory or active exploration. It targets researchers in Embodied AI and conversational agents, providing a benchmark dataset and an LLM-powered evaluation protocol to challenge foundation models.

How It Works

The project defines EQA as understanding an environment to answer questions in natural language. This understanding is achieved either through recalling past experiences (episodic memory) or by actively exploring the physical space. The OpenEQA dataset, comprising over 1600 human-generated question-answer pairs across 180 real-world environments, supports both these approaches. An automatic evaluation protocol using GPT-4 is also provided, demonstrating high correlation with human judgment.

Quick Start & Requirements

Install: conda create -n openeqa python=3.9, conda activate openeqa, pip install -r requirements.txt, pip install -e .
Prerequisites: Python >= 3.9, OpenAI API key for evaluation.
Dataset: Episode histories require separate download following instructions.
Links: paper, project, dataset

Highlighted Details

First open-vocabulary benchmark dataset for EQA supporting both episodic memory and active exploration.
LLM-powered automatic evaluation protocol shows excellent correlation with human judgment.
Evaluates state-of-the-art foundation models like GPT-4V, finding significant gaps compared to human performance.
Over 1600 high-quality human-generated questions from 180 real-world environments.

Maintenance & Community

The project is from Facebook Research, with notable contributors including Arjun Majumdar, Dhruv Batra, and Franziska Meier. No community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive for commercial use and closed-source linking.

Limitations & Caveats

The README indicates that current foundation models significantly lag behind human-level performance on this benchmark, suggesting it poses a considerable challenge. Episode histories require a separate download process.

open-eqa by facebookresearch

Explore Similar Projects

ERQA by embodiedreasoning

ITFormer-ICML25 by Pandalin98

OmniSearch by Alibaba-NLP

ComoRAG by EternityJune25

locomo by snap-research

MemoryBank-SiliconFriend by zhongwanjun

AutoDidact by dCaples

agentic-memory by ALucek

m3-agent by ByteDance-Seed

LongBench by THUDM

EverMemOS by EverMind-AI

LMOps by microsoft