WindowsAgentArena by microsoft

Scalable platform for testing multi-modal AI agents in a Windows OS environment

Created 1 year ago

826 stars

Top 42.8% on SourcePulse

Project Summary

Windows Agent Arena (WAA) is a scalable platform for testing and benchmarking multi-modal AI agents on a Windows OS. It provides a reproducible environment for researchers and developers to evaluate agentic AI workflows across diverse tasks, enabling rapid benchmarking with parallel agent deployment via Azure ML.

How It Works

WAA utilizes Docker to create a consistent Windows 11 VM environment. Agents interact with the OS through a Python server within the VM, executing commands and receiving screen state information. The platform supports various screen understanding models, including the open-sourced Omniparser, and offers flexibility in accessibility backends (UI Automation or Win32) to cater to different agent needs and performance requirements.

Quick Start & Requirements

Install: Clone the repository, activate a Python 3.9 Conda environment, and install dependencies (pip install -r requirements.txt).
Prerequisites: Docker daemon (WSL 2 recommended on Windows), OpenAI or Azure OpenAI API Key, Python 3.9.
Setup: Local setup involves pulling a base Docker image, building the WAA image, preparing a Windows 11 VM snapshot (~20 mins), and configuring API keys. Azure deployment requires Azure ML setup, uploading the VM image, and configuring experiments.
Links: Website, Paper

Highlighted Details

Supports scalable deployment and parallel execution on Azure ML for faster benchmarking.
Offers multiple screen understanding (som_origin) and accessibility backend (a11y_backend) configurations for varied agent capabilities.
Includes the open-sourced Omniparser, a top-performing screen understanding model.
Provides a "Bring Your Own Agent" (BYOA) framework for custom agent integration.

Maintenance & Community

The project is actively maintained by Microsoft, with recent updates including a new difficulty mode and the release of Omniparser. Contributions are welcomed for new agents and tasks.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Local deployment requires significant disk space (~30GB for the VM image) and can be resource-intensive. Running without KVM acceleration is not recommended due to performance degradation. Azure deployment incurs cloud costs.

WindowsAgentArena by microsoft

Explore Similar Projects

alphora by opencmit

kwaak by bosun-ai

agent-studio by sxhxliang

agent-evaluation by awslabs

appworld by StonyBrookNLP

AgentCPM by OpenBMB

adk-js by google

agentstack by i-am-bee

openwork by langchain-ai

Auto-GPT-ZH by kaqijiang

adk-docs by google

AutoGPT by Significant-Gravitas