webarena  by web-arena-x

Web environment for autonomous agent development

created 2 years ago
1,080 stars

Top 35.7% on sourcepulse

GitHubView on GitHub
Project Summary

WebArena provides a realistic, self-hostable web environment for developing and evaluating autonomous agents. It addresses the need for reproducible web navigation research by offering a standardized framework with diverse, real-world websites and tasks, benefiting researchers and developers in the field of AI agents.

How It Works

WebArena simulates a browser environment, allowing agents to interact with websites through an accessibility tree observation and ID-based actions. This approach simplifies agent development by abstracting complex DOM manipulation into discrete, actionable steps, facilitating easier integration with large language models (LLMs) for decision-making.

Quick Start & Requirements

  • Install: conda create -n webarena python=3.10, conda activate webarena, pip install -r requirements.txt, playwright install, pip install -e .
  • Prerequisites: Python 3.10+, Playwright, OpenAI API key.
  • Setup: Requires hosting individual websites (e.g., shopping, Reddit, GitLab) and configuring environment variables. An AMI is available for pre-installed environments.
  • Docs: Website

Highlighted Details

  • Supports parallel experiments via BrowserGym.
  • Integrates popular benchmarks like VisualWebArena.
  • Offers unified leaderboard reporting.
  • Includes human annotator trajectories for ~170 tasks.

Maintenance & Community

The project is actively maintained, with recent updates enhancing its infrastructure through AgentLab. Resources for analysis and community interaction are available via Zeno and a dedicated website.

Licensing & Compatibility

The repository is released under a permissive license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The demo sites are for browsing only; reproducible experiments require setting up and configuring your own standalone WebArena websites. The evaluation process involves specific setup steps for each website and obtaining auto-login cookies.

Health Check
Last commit

5 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
3
Star History
108 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.