webarena  by web-arena-x

Web environment for autonomous agent development

Created 2 years ago
1,143 stars

Top 33.7% on SourcePulse

GitHubView on GitHub
Project Summary

WebArena provides a realistic, self-hostable web environment for developing and evaluating autonomous agents. It addresses the need for reproducible web navigation research by offering a standardized framework with diverse, real-world websites and tasks, benefiting researchers and developers in the field of AI agents.

How It Works

WebArena simulates a browser environment, allowing agents to interact with websites through an accessibility tree observation and ID-based actions. This approach simplifies agent development by abstracting complex DOM manipulation into discrete, actionable steps, facilitating easier integration with large language models (LLMs) for decision-making.

Quick Start & Requirements

  • Install: conda create -n webarena python=3.10, conda activate webarena, pip install -r requirements.txt, playwright install, pip install -e .
  • Prerequisites: Python 3.10+, Playwright, OpenAI API key.
  • Setup: Requires hosting individual websites (e.g., shopping, Reddit, GitLab) and configuring environment variables. An AMI is available for pre-installed environments.
  • Docs: Website

Highlighted Details

  • Supports parallel experiments via BrowserGym.
  • Integrates popular benchmarks like VisualWebArena.
  • Offers unified leaderboard reporting.
  • Includes human annotator trajectories for ~170 tasks.

Maintenance & Community

The project is actively maintained, with recent updates enhancing its infrastructure through AgentLab. Resources for analysis and community interaction are available via Zeno and a dedicated website.

Licensing & Compatibility

The repository is released under a permissive license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The demo sites are for browsing only; reproducible experiments require setting up and configuring your own standalone WebArena websites. The evaluation process involves specific setup steps for each website and obtaining auto-login cookies.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
8
Star History
44 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Gregor Zunic Gregor Zunic(Cofounder of Browser Use), and
1 more.

BrowserGym by ServiceNow

0.8%
895
Gym environment for web task automation research
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.