open-eqa  by facebookresearch

Benchmark dataset for embodied question answering (EQA) research

created 1 year ago
306 stars

Top 88.6% on sourcepulse

GitHubView on GitHub
Project Summary

OpenEQA introduces a new formulation for Embodied Question Answering (EQA), enabling agents to answer questions about environments by leveraging episodic memory or active exploration. It targets researchers in Embodied AI and conversational agents, providing a benchmark dataset and an LLM-powered evaluation protocol to challenge foundation models.

How It Works

The project defines EQA as understanding an environment to answer questions in natural language. This understanding is achieved either through recalling past experiences (episodic memory) or by actively exploring the physical space. The OpenEQA dataset, comprising over 1600 human-generated question-answer pairs across 180 real-world environments, supports both these approaches. An automatic evaluation protocol using GPT-4 is also provided, demonstrating high correlation with human judgment.

Quick Start & Requirements

  • Install: conda create -n openeqa python=3.9, conda activate openeqa, pip install -r requirements.txt, pip install -e .
  • Prerequisites: Python >= 3.9, OpenAI API key for evaluation.
  • Dataset: Episode histories require separate download following instructions.
  • Links: paper, project, dataset

Highlighted Details

  • First open-vocabulary benchmark dataset for EQA supporting both episodic memory and active exploration.
  • LLM-powered automatic evaluation protocol shows excellent correlation with human judgment.
  • Evaluates state-of-the-art foundation models like GPT-4V, finding significant gaps compared to human performance.
  • Over 1600 high-quality human-generated questions from 180 real-world environments.

Maintenance & Community

The project is from Facebook Research, with notable contributors including Arjun Majumdar, Dhruv Batra, and Franziska Meier. No community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive for commercial use and closed-source linking.

Limitations & Caveats

The README indicates that current foundation models significantly lag behind human-level performance on this benchmark, suggesting it poses a considerable challenge. Episode histories require a separate download process.

Health Check
Last commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.