agent-desktop  by lahfir

Desktop automation CLI for AI agents

Created 3 months ago
834 stars

Top 42.2% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Agent Desktop provides a native command-line interface (CLI) built with Rust for AI agents to automate desktop applications. It offers structured, deterministic control over any application by leveraging OS accessibility trees, bypassing brittle methods like pixel matching or screenshots. This enables AI agents to reliably interact with the desktop environment, significantly reducing token usage for dense applications through optimized data representation.

How It Works

The core of agent-desktop is a fast, single-binary Rust CLI that interacts with applications via their native accessibility APIs. It outputs structured JSON, providing machine-readable responses with error codes and recovery hints. A key innovation is its use of deterministic element references (e.g., @e1) derived from accessibility tree snapshots, allowing AI agents to target UI elements reliably across interactions. For AI optimization, it employs progressive skeleton traversal, generating shallow overviews that can be drilled down into, drastically reducing token consumption. A C-ABI dynamic library (cdylib) facilitates in-process integration with languages like Python, Swift, and Go, avoiding the overhead of forking the CLI process per command.

Quick Start & Requirements

Installation is recommended via npm: npm install -g agent-desktop, which automatically downloads a prebuilt binary. Alternatively, it can be built from source using cargo build --release (requires Rust 1.78+). macOS 13.0+ is required, and Accessibility permissions must be granted via System Settings or by running agent-desktop permissions --request. Prebuilt C-ABI cdylib artifacts are available per release for cross-platform FFI integration.

Highlighted Details

  • Token Efficiency: Progressive skeleton traversal reduces token usage by 78–96% on dense applications like Slack or VS Code.
  • Deterministic References: Utilizes stable element references (@e1, @e2) generated from accessibility snapshots for reliable AI-driven interactions.
  • AX-First Interaction: Prioritizes pure accessibility API strategies before resorting to mouse events, ensuring robustness.
  • Extensive Command Set: Offers 54 commands for observation, interaction, keyboard/mouse control, window management, notifications, and clipboard operations.
  • In-Process FFI: A C-ABI cdylib allows direct, efficient calls from various programming languages without process forking.

Maintenance & Community

The repository is hosted on GitHub at https://github.com/lahfir/agent-desktop. Specific details regarding community channels, active maintainers, or project roadmap are not explicitly detailed in the provided README.

Licensing & Compatibility

Agent Desktop is licensed under the Apache-2.0 license. This permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Full platform support, including accessibility tree interaction, is currently limited to macOS. Support for Windows and Linux is marked as "Planned," indicating these platforms may not be fully functional or may lack certain features in the current release.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
18
Issues (30d)
3
Star History
186 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.