WindowsAgentArena  by microsoft

Scalable platform for testing multi-modal AI agents in a Windows OS environment

Created 1 year ago
766 stars

Top 45.6% on SourcePulse

GitHubView on GitHub
Project Summary

Windows Agent Arena (WAA) is a scalable platform for testing and benchmarking multi-modal AI agents on a Windows OS. It provides a reproducible environment for researchers and developers to evaluate agentic AI workflows across diverse tasks, enabling rapid benchmarking with parallel agent deployment via Azure ML.

How It Works

WAA utilizes Docker to create a consistent Windows 11 VM environment. Agents interact with the OS through a Python server within the VM, executing commands and receiving screen state information. The platform supports various screen understanding models, including the open-sourced Omniparser, and offers flexibility in accessibility backends (UI Automation or Win32) to cater to different agent needs and performance requirements.

Quick Start & Requirements

  • Install: Clone the repository, activate a Python 3.9 Conda environment, and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Docker daemon (WSL 2 recommended on Windows), OpenAI or Azure OpenAI API Key, Python 3.9.
  • Setup: Local setup involves pulling a base Docker image, building the WAA image, preparing a Windows 11 VM snapshot (~20 mins), and configuring API keys. Azure deployment requires Azure ML setup, uploading the VM image, and configuring experiments.
  • Links: Website, Paper

Highlighted Details

  • Supports scalable deployment and parallel execution on Azure ML for faster benchmarking.
  • Offers multiple screen understanding (som_origin) and accessibility backend (a11y_backend) configurations for varied agent capabilities.
  • Includes the open-sourced Omniparser, a top-performing screen understanding model.
  • Provides a "Bring Your Own Agent" (BYOA) framework for custom agent integration.

Maintenance & Community

The project is actively maintained by Microsoft, with recent updates including a new difficulty mode and the release of Omniparser. Contributions are welcomed for new agents and tasks.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Local deployment requires significant disk space (~30GB for the VM image) and can be resource-intensive. Running without KVM acceleration is not recommended due to performance degradation. Azure deployment incurs cloud costs.

Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Wes McKinney Wes McKinney(Author of Pandas), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
22 more.

autogen by microsoft

0.5%
50k
Agentic framework for multi-agent AI applications
Created 2 years ago
Updated 19 hours ago
Feedback? Help us improve.