computer_use_ootb  by showlab

GUI agent for Windows and macOS

created 9 months ago
1,646 stars

Top 26.1% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an out-of-the-box GUI agent for Windows and macOS, enabling users to control their computers via natural language commands. It supports both API-based models like Claude 3.5 and locally-run models such as ShowUI and UI-TARS, offering a cost-effective and flexible solution for automating desktop tasks.

How It Works

The agent acts as a unified planner and actor, capable of high-level decision-making and low-level action execution on the desktop. It leverages various Large Language Models (LLMs) and Vision-Language-Action (VLA) models, supporting both remote API calls and local inference. This dual approach allows users to choose between convenience and cost savings, with local models like ShowUI offering significant cost reductions compared to API-based solutions.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies using pip install -r requirements.txt.
  • Prerequisites: Miniconda (Python >= 3.12).
  • Optional (Local Run):
    • ShowUI: NVIDIA GPU (>=6GB VRAM) for Windows, M1 chip (>=16GB RAM) for macOS. Download model files via python install_tools/install_showui.py.
    • UI-TARS: Deploy a UI-TARS server locally.
  • API Keys: Required for API-based models (Anthropic, Qwen, OpenAI).
  • Setup: Estimated setup time is minimal for API-based use; local model setup requires downloading model files and potentially GPU driver configuration.
  • Links: Project Page, arXiv, Demo Video.

Highlighted Details

  • Supports multiple displays and arbitrary resolutions.
  • Offers remote control via mobile devices without app installation.
  • Includes a 4-bit quantized ShowUI model for efficient inference.
  • Supports a range of models including Claude 3.5 Sonnet, GPT-4o, Qwen2-VL, and ShowUI.

Maintenance & Community

  • Active development with recent updates adding local run capabilities and UI-TARS support.
  • Community discussion available via Discord.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The models' performance is still limited and may generate unintended or potentially harmful outputs, requiring continuous monitoring. API-based tasks can incur costs. The Claude API stability for task solving is under investigation.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
123 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Robert Stojnic Robert Stojnic(Creator of Papers with Code).

Agent-S by simular-ai

1.2%
6k
Agentic framework for autonomous computer interaction
created 9 months ago
updated 20 hours ago
Feedback? Help us improve.