alfred  by askforalfred

Benchmark dataset for instruction-following agents in interactive, visually-realistic environments

created 5 years ago
436 stars

Top 69.4% on sourcepulse

GitHubView on GitHub
Project Summary

ALFRED is a benchmark dataset and framework for embodied AI agents that learn to follow natural language instructions for everyday household tasks. It targets researchers and engineers developing agents capable of grounded language understanding and action sequencing in simulated environments, aiming to bridge the gap between academic benchmarks and real-world applications.

How It Works

ALFRED utilizes the AI2-THOR simulator to create realistic household environments. Agents are trained to map egocentric vision and natural language instructions to sequences of actions. The benchmark emphasizes long composition rollouts and non-reversible state changes, presenting challenges similar to real-world task execution.

Quick Start & Requirements

  • Install: Clone the repository and install requirements using pip install -r requirements.txt within a Python virtual environment.
  • Data: Download Trajectory JSONs and Resnet features (~17GB) using sh download_data.sh json_feat.
  • Prerequisites: Python 3, PyTorch 1.1.0, Torchvision 0.3.0, AI2THOR 2.1.0.
  • Hardware: Tested on GPU (GTX 1080 Ti, 12GB), CPU (Quad Core), 16GB RAM, Ubuntu 16.04. OpenGL support is required for the simulator.
  • Docs: askforalfred.com

Highlighted Details

  • Supports training Seq2Seq models with optional auxiliary losses for progress monitoring and subgoals.
  • Provides a framework for evaluating agent performance on seen and unseen test sets via email submissions to a leaderboard.
  • Includes Docker setup for easier deployment on cloud instances and headless environments.
  • Lists several State-of-the-Art (SOTA) models and their associated papers/code.

Maintenance & Community

  • The project is associated with prominent researchers from institutions like the University of Washington and Meta AI.
  • Contact for questions or issues is via askforalfred@googlegroups.com.
  • The AI2 leaderboard has been deprecated as of April 2025, with instructions for email submissions provided.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The AI2 leaderboard has been deprecated, requiring manual email submissions for evaluation. The benchmark's training process can be resource-intensive, requiring significant data downloads and GPU resources.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.