ShowUI-Aloha  by showlab

Human-taught agent for automating desktop workflows

Created 2 months ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
Project Summary

ShowUI-Aloha is a human-taught computer-use agent designed to automate workflows on Windows and macOS desktops. It addresses the challenge of creating adaptable agents by learning from human demonstrations, enabling generalization to new task variants rather than simple memorization. This benefits users seeking to automate repetitive desktop tasks by providing a system that can understand and execute novel sequences of actions based on learned patterns.

How It Works

The system operates through a four-stage pipeline: Recorder, Learner, Planner, and Actor/Executor. The Recorder captures user interactions, including screen, mouse, and keyboard inputs. The Learner processes these demonstrations to extract semantic action traces, focusing on abstraction rather than rote memorization. The Planner then uses these traces to devise execution strategies for new task variants. Finally, the Actor and Executor reliably perform OS-level actions such as clicks, drags, typing, and scrolling to carry out the planned task. This approach allows a single demonstration to generalize across a family of related tasks.

Quick Start & Requirements

  • Requirements: Windows 10+ or macOS, Python 3.10+, at least one VLM API key (OpenAI or Claude).
  • Installation: Clone the repository, create a Python virtual environment, and install dependencies using pip install -r requirements.txt. API keys must be configured in config/api_keys.json.
  • Recorder: Download the Aloha.Screen.Recorder.exe or Aloha.Screen.Recorder-arm64.dmg from the project's Releases.
  • Usage: Record a demonstration using the Recorder, parse it into a trace using python Aloha_Learn/parser.py {project_name}, and execute tasks via python Aloha_Act/scripts/aloha_run.py --task "Your task" --trace_id "{trace_id}".
  • Links: Repository: https://github.com/showlab/ShowUI-Aloha

Highlighted Details

  • Achieved 217 end-to-end task solutions out of 361 OSWorld-Style tasks evaluated using a strict binary metric.
  • Supports both Windows and macOS operating systems.
  • Features a modular architecture comprising Recorder, Learner, Actor, and Executor components.

Maintenance & Community

The project is associated with an arXiv publication dated 2026, suggesting recent development activity. No specific community channels (like Discord or Slack) or details on maintainers/sponsors are provided in the README.

Licensing & Compatibility

The project is released under the Apache-2.0 License. This license is permissive and generally compatible with commercial use and closed-source linking, allowing for broad adoption.

Limitations & Caveats

Support for Linux is explicitly listed as a future roadmap item, indicating it is not currently supported. The agent's effectiveness is dependent on the quality and representativeness of the human demonstrations provided for learning.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
77 stars in the last 30 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), Gabriel Almeida Gabriel Almeida(Cofounder of Langflow), and
9 more.

terminal-bench by harbor-framework

2.4%
2k
Benchmark for LLM agents in real terminal environments
Created 1 year ago
Updated 1 month ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Assaf Elovic Assaf Elovic(Cofounder of Tavily), and
2 more.

XAgent by OpenBMB

0.1%
9k
Autonomous LLM agent for complex task solving
Created 2 years ago
Updated 1 year ago
Starred by Paul Stamatiou Paul Stamatiou(Cofounder of Limitless), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

claude-task-master by eyaltoledano

0.4%
26k
AI-powered task management system for code editors
Created 1 year ago
Updated 4 days ago
Feedback? Help us improve.