Discover and explore top open-source AI tools and projects—updated daily.
ServiceNowBenchmark for evaluating web agents on knowledge work
Top 99.8% on SourcePulse
WorkArena is a benchmark suite designed to evaluate the capabilities of web agents in performing common knowledge work tasks within the ServiceNow platform. It targets AI researchers and developers building agents for enterprise automation, offering a standardized, browser-based environment to assess agent performance on realistic workflows and accelerate the development of effective solutions for knowledge workers.
How It Works
The benchmark utilizes the ServiceNow platform to construct a diverse set of browser-based tasks. WorkArena-L1 features 33 atomic tasks covering core ServiceNow UI components, totaling over 19,000 instances. WorkArena++ composes these atomic elements into more complex, real-world scenarios that test agents' planning, reasoning, and memory abilities. Evaluations are typically conducted using the AgentLab framework, which integrates with BrowserGym for parallel experiments and reporting on a unified leaderboard.
Quick Start & Requirements
pip install browsergym-workarena, followed by playwright install.huggingface-cli login) is necessary.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 month ago
Inactive
TheAgentCompany
ServiceNow