Discover and explore top open-source AI tools and projects—updated daily.
Scalable platform for testing multi-modal AI agents in a Windows OS environment
Top 45.6% on SourcePulse
Windows Agent Arena (WAA) is a scalable platform for testing and benchmarking multi-modal AI agents on a Windows OS. It provides a reproducible environment for researchers and developers to evaluate agentic AI workflows across diverse tasks, enabling rapid benchmarking with parallel agent deployment via Azure ML.
How It Works
WAA utilizes Docker to create a consistent Windows 11 VM environment. Agents interact with the OS through a Python server within the VM, executing commands and receiving screen state information. The platform supports various screen understanding models, including the open-sourced Omniparser, and offers flexibility in accessibility backends (UI Automation or Win32) to cater to different agent needs and performance requirements.
Quick Start & Requirements
pip install -r requirements.txt
).Highlighted Details
som_origin
) and accessibility backend (a11y_backend
) configurations for varied agent capabilities.Maintenance & Community
The project is actively maintained by Microsoft, with recent updates including a new difficulty mode and the release of Omniparser. Contributions are welcomed for new agents and tasks.
Licensing & Compatibility
Licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
Local deployment requires significant disk space (~30GB for the VM image) and can be resource-intensive. Running without KVM acceleration is not recommended due to performance degradation. Azure deployment incurs cloud costs.
4 months ago
1 day