GUI agent for Windows and macOS
Top 26.1% on sourcepulse
This project provides an out-of-the-box GUI agent for Windows and macOS, enabling users to control their computers via natural language commands. It supports both API-based models like Claude 3.5 and locally-run models such as ShowUI and UI-TARS, offering a cost-effective and flexible solution for automating desktop tasks.
How It Works
The agent acts as a unified planner and actor, capable of high-level decision-making and low-level action execution on the desktop. It leverages various Large Language Models (LLMs) and Vision-Language-Action (VLA) models, supporting both remote API calls and local inference. This dual approach allows users to choose between convenience and cost savings, with local models like ShowUI offering significant cost reductions compared to API-based solutions.
Quick Start & Requirements
pip install -r requirements.txt
.python install_tools/install_showui.py
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The models' performance is still limited and may generate unintended or potentially harmful outputs, requiring continuous monitoring. API-based tasks can incur costs. The Claude API stability for task solving is under investigation.
2 months ago
1 day