Mind2Web  by OSU-NLP-Group

Web agent benchmark dataset and models (NeurIPS'23 Spotlight paper)

created 2 years ago
850 stars

Top 42.9% on sourcepulse

GitHubView on GitHub
Project Summary

Mind2Web provides a comprehensive dataset and framework for developing and evaluating generalist web agents. It addresses the limitations of existing web agent benchmarks by using real-world websites and a broad spectrum of user interaction patterns, enabling the creation of agents that can follow natural language instructions to complete complex tasks across diverse websites. The target audience includes researchers and developers focused on LLM-based agents and web automation.

How It Works

The project offers a dataset of over 2,000 open-ended tasks across 137 websites, featuring crowdsourced action sequences. It supports a two-stage approach: candidate generation using a DeBERTa-v3-base model for identifying potential interactive elements, and action prediction using seq2seq T5 models (like Flan-T5) or LLMs (like GPT-3.5/4) to select the final action. This modular design allows for flexibility in agent architecture and evaluation.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip.
  • Dataset: Clone training data from Huggingface (osunlp/Mind2Web) and download/unzip test data (password: mind2web).
  • Prerequisites: Python, PyTorch, Huggingface libraries, OpenAI API key for LLM evaluation.
  • Resources: Requires significant disk space for dataset downloads. Fine-tuning and evaluation may require GPUs.
  • Links: Project Website, Huggingface Datasets, SeeAct.

Highlighted Details

  • NeurIPS'23 Spotlight paper.
  • Includes multimodal data (HTML + screenshots) for enhanced agent capabilities.
  • Offers fine-tuning code and pre-trained models for both candidate generation and action prediction.
  • Supports evaluation using both fine-tuned models and large language models (LLMs).

Maintenance & Community

The project is associated with The Ohio State University (OSU-NLP-Group). Updates are regularly posted, including releases for multimodal data and related projects like SeeAct.

Licensing & Compatibility

  • Dataset License: Creative Commons Attribution 4.0 International License (CC BY 4.0).
  • Code License: MIT License.
  • Compatibility: Permissive licenses allow for commercial use and integration into closed-source projects.

Limitations & Caveats

The raw dump data is large and requires Globus for access. Matching network traffic (HAR files) to specific actions can be non-trivial due to web dynamism. The dataset is intended for research purposes, and the authors strongly caution against harmful use.

Health Check
Last commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
38 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.