Web agent benchmark dataset and models (NeurIPS'23 Spotlight paper)
Top 42.9% on sourcepulse
Mind2Web provides a comprehensive dataset and framework for developing and evaluating generalist web agents. It addresses the limitations of existing web agent benchmarks by using real-world websites and a broad spectrum of user interaction patterns, enabling the creation of agents that can follow natural language instructions to complete complex tasks across diverse websites. The target audience includes researchers and developers focused on LLM-based agents and web automation.
How It Works
The project offers a dataset of over 2,000 open-ended tasks across 137 websites, featuring crowdsourced action sequences. It supports a two-stage approach: candidate generation using a DeBERTa-v3-base model for identifying potential interactive elements, and action prediction using seq2seq T5 models (like Flan-T5) or LLMs (like GPT-3.5/4) to select the final action. This modular design allows for flexibility in agent architecture and evaluation.
Quick Start & Requirements
pip
.osunlp/Mind2Web
) and download/unzip test data (password: mind2web
).Highlighted Details
Maintenance & Community
The project is associated with The Ohio State University (OSU-NLP-Group). Updates are regularly posted, including releases for multimodal data and related projects like SeeAct.
Licensing & Compatibility
Limitations & Caveats
The raw dump data is large and requires Globus for access. Matching network traffic (HAR files) to specific actions can be non-trivial due to web dynamism. The dataset is intended for research purposes, and the authors strongly caution against harmful use.
4 months ago
1 week