Mind2Web by OSU-NLP-Group

Web agent benchmark dataset and models (NeurIPS'23 Spotlight paper)

Created 2 years ago

946 stars

Top 38.6% on SourcePulse

View on GitHub

3 Experts Love This Project

Jiayi Pan

Author of SWE-Gym; MTS at xAI

Project Summary

Mind2Web provides a comprehensive dataset and framework for developing and evaluating generalist web agents. It addresses the limitations of existing web agent benchmarks by using real-world websites and a broad spectrum of user interaction patterns, enabling the creation of agents that can follow natural language instructions to complete complex tasks across diverse websites. The target audience includes researchers and developers focused on LLM-based agents and web automation.

How It Works

The project offers a dataset of over 2,000 open-ended tasks across 137 websites, featuring crowdsourced action sequences. It supports a two-stage approach: candidate generation using a DeBERTa-v3-base model for identifying potential interactive elements, and action prediction using seq2seq T5 models (like Flan-T5) or LLMs (like GPT-3.5/4) to select the final action. This modular design allows for flexibility in agent architecture and evaluation.

Quick Start & Requirements

Install: Clone the repository and install dependencies via pip.
Dataset: Clone training data from Huggingface (osunlp/Mind2Web) and download/unzip test data (password: mind2web).
Prerequisites: Python, PyTorch, Huggingface libraries, OpenAI API key for LLM evaluation.
Resources: Requires significant disk space for dataset downloads. Fine-tuning and evaluation may require GPUs.
Links: Project Website, Huggingface Datasets, SeeAct.

Highlighted Details

NeurIPS'23 Spotlight paper.
Includes multimodal data (HTML + screenshots) for enhanced agent capabilities.
Offers fine-tuning code and pre-trained models for both candidate generation and action prediction.
Supports evaluation using both fine-tuned models and large language models (LLMs).

Maintenance & Community

The project is associated with The Ohio State University (OSU-NLP-Group). Updates are regularly posted, including releases for multimodal data and related projects like SeeAct.

Licensing & Compatibility

Dataset License: Creative Commons Attribution 4.0 International License (CC BY 4.0).
Code License: MIT License.
Compatibility: Permissive licenses allow for commercial use and integration into closed-source projects.

Limitations & Caveats

The raw dump data is large and requires Globus for access. Matching network traffic (HAR files) to specific actions can be non-trivial due to web dynamism. The dataset is intended for research purposes, and the authors strongly caution against harmful use.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days