Benchmark dataset for instruction-following agents in interactive, visually-realistic environments
Top 69.4% on sourcepulse
ALFRED is a benchmark dataset and framework for embodied AI agents that learn to follow natural language instructions for everyday household tasks. It targets researchers and engineers developing agents capable of grounded language understanding and action sequencing in simulated environments, aiming to bridge the gap between academic benchmarks and real-world applications.
How It Works
ALFRED utilizes the AI2-THOR simulator to create realistic household environments. Agents are trained to map egocentric vision and natural language instructions to sequences of actions. The benchmark emphasizes long composition rollouts and non-reversible state changes, presenting challenges similar to real-world task execution.
Quick Start & Requirements
pip install -r requirements.txt
within a Python virtual environment.sh download_data.sh json_feat
.Highlighted Details
Maintenance & Community
askforalfred@googlegroups.com
.Licensing & Compatibility
Limitations & Caveats
The AI2 leaderboard has been deprecated, requiring manual email submissions for evaluation. The benchmark's training process can be resource-intensive, requiring significant data downloads and GPU resources.
3 months ago
1 day