ShowUI-Aloha by showlab

Human-taught agent for automating desktop workflows

Created 5 months ago

309 stars

Top 86.9% on SourcePulse

Project Summary

ShowUI-Aloha is a human-taught computer-use agent designed to automate workflows on Windows and macOS desktops. It addresses the challenge of creating adaptable agents by learning from human demonstrations, enabling generalization to new task variants rather than simple memorization. This benefits users seeking to automate repetitive desktop tasks by providing a system that can understand and execute novel sequences of actions based on learned patterns.

How It Works

The system operates through a four-stage pipeline: Recorder, Learner, Planner, and Actor/Executor. The Recorder captures user interactions, including screen, mouse, and keyboard inputs. The Learner processes these demonstrations to extract semantic action traces, focusing on abstraction rather than rote memorization. The Planner then uses these traces to devise execution strategies for new task variants. Finally, the Actor and Executor reliably perform OS-level actions such as clicks, drags, typing, and scrolling to carry out the planned task. This approach allows a single demonstration to generalize across a family of related tasks.

Quick Start & Requirements

Requirements: Windows 10+ or macOS, Python 3.10+, at least one VLM API key (OpenAI or Claude).
Installation: Clone the repository, create a Python virtual environment, and install dependencies using pip install -r requirements.txt. API keys must be configured in config/api_keys.json.
Recorder: Download the Aloha.Screen.Recorder.exe or Aloha.Screen.Recorder-arm64.dmg from the project's Releases.
Usage: Record a demonstration using the Recorder, parse it into a trace using python Aloha_Learn/parser.py {project_name}, and execute tasks via python Aloha_Act/scripts/aloha_run.py --task "Your task" --trace_id "{trace_id}".
Links: Repository: https://github.com/showlab/ShowUI-Aloha

Highlighted Details

Achieved 217 end-to-end task solutions out of 361 OSWorld-Style tasks evaluated using a strict binary metric.
Supports both Windows and macOS operating systems.
Features a modular architecture comprising Recorder, Learner, Actor, and Executor components.

Maintenance & Community

The project is associated with an arXiv publication dated 2026, suggesting recent development activity. No specific community channels (like Discord or Slack) or details on maintainers/sponsors are provided in the README.

Licensing & Compatibility

The project is released under the Apache-2.0 License. This license is permissive and generally compatible with commercial use and closed-source linking, allowing for broad adoption.

Limitations & Caveats

Support for Linux is explicitly listed as a future roadmap item, indicating it is not currently supported. The agent's effectiveness is dependent on the quality and representativeness of the human demonstrations provided for learning.

ShowUI-Aloha by showlab

Explore Similar Projects

EvoCUA by meituan

SEAgent by SunzeY

understudy by understudy-ai

xai-gpt-agent-toolkit by XpressAI

Clevrr-Computer by Clevrr-AI

buddyme by virgo777

ScaleCUA by OpenGVLab

DeepAgent by RUC-NLPIR

10x-Tool-Calls by perrypixel

terminal-bench by harbor-framework

XAgent by OpenBMB

UFO by microsoft