webarena by web-arena-x

Web environment for autonomous agent development

Created 3 years ago

1,539 stars

Top 26.1% on SourcePulse

View on GitHub

3 Experts Love This Project

Ying Sheng

Coauthor of SGLang

Jiayi Pan

Author of SWE-Gym; MTS at xAI

Travis Fischer

Founder of Agentic

Project Summary

WebArena provides a realistic, self-hostable web environment for developing and evaluating autonomous agents. It addresses the need for reproducible web navigation research by offering a standardized framework with diverse, real-world websites and tasks, benefiting researchers and developers in the field of AI agents.

How It Works

WebArena simulates a browser environment, allowing agents to interact with websites through an accessibility tree observation and ID-based actions. This approach simplifies agent development by abstracting complex DOM manipulation into discrete, actionable steps, facilitating easier integration with large language models (LLMs) for decision-making.

Quick Start & Requirements

Install: conda create -n webarena python=3.10, conda activate webarena, pip install -r requirements.txt, playwright install, pip install -e .
Prerequisites: Python 3.10+, Playwright, OpenAI API key.
Setup: Requires hosting individual websites (e.g., shopping, Reddit, GitLab) and configuring environment variables. An AMI is available for pre-installed environments.
Docs: Website

Highlighted Details

Supports parallel experiments via BrowserGym.
Integrates popular benchmarks like VisualWebArena.
Offers unified leaderboard reporting.
Includes human annotator trajectories for ~170 tasks.

Maintenance & Community

The project is actively maintained, with recent updates enhancing its infrastructure through AgentLab. Resources for analysis and community interaction are available via Zeno and a dedicated website.

Licensing & Compatibility

The repository is released under a permissive license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The demo sites are for browsing only; reproducible experiments require setting up and configuring your own standalone WebArena websites. The evaluation process involves specific setup steps for each website and obtaining auto-login cookies.

Health Check

Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

34 stars in the last 30 days