OSWorld  by xlang-ai

Multimodal agent benchmark for open-ended tasks in realistic computer environments

created 1 year ago
1,995 stars

Top 22.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

OSWorld provides a benchmark for multimodal agents to perform open-ended tasks within real computer environments, targeting AI researchers and developers building agents that interact with graphical user interfaces. It enables the evaluation of agent capabilities in realistic desktop and web application scenarios.

How It Works

OSWorld leverages virtual machine technology (VMware, VirtualBox, Docker) to create isolated, reproducible environments that mimic real computer systems. Agents interact with these environments using a combination of visual observations (screenshots) and potentially accessibility tree information, executing actions via simulated mouse and keyboard inputs (e.g., using pyautogui). This approach allows for complex, multi-step task execution and evaluation in a controlled yet realistic setting.

Quick Start & Requirements

  • Installation: Clone the repository, create a Conda environment (Python >= 3.9), and install dependencies via pip install -r requirements.txt.
  • Prerequisites: Requires VMware Workstation Pro (or Fusion for Apple Chips) and configuring vmrun, or Docker with KVM support.
  • Setup: The setup script automatically downloads and configures necessary virtual machines.
  • Documentation: Website, Paper, Doc

Highlighted Details

  • Supports multiple VM providers: VMware, VirtualBox, and Docker.
  • Offers various observation types: screenshots, accessibility trees, etc.
  • Includes baseline agents for GPT-4V, Gemini-ProV, and Claude-3 Opus.
  • Provides detailed evaluation metrics and results visualization tools.

Maintenance & Community

The project is associated with NeurIPS 2024 and has active development with recent updates supporting Docker and expanding VM provider options. A Discord server is available for community engagement.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification on the licensing terms.

Limitations & Caveats

VMware support on macOS may have limitations, and KVM support is generally not available on macOS hosts. Running experiments can be time-consuming and incur costs, especially with powerful models and extensive testing. Residual Docker containers may require manual cleanup.

Health Check
Last commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
51
Issues (30d)
13
Star History
191 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

cua by trycua

0.5%
9k
AI agent framework for computer OS control in virtual containers
created 6 months ago
updated 2 days ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

SuperAGI by TransformerOptimus

0.2%
17k
Open-source framework for autonomous AI agent development
created 2 years ago
updated 6 months ago
Feedback? Help us improve.