usecomputer  by remorses

Desktop automation CLI for AI agents

Created 1 month ago
274 stars

Top 94.2% on SourcePulse

GitHubView on GitHub
Project Summary

A fast, native desktop automation CLI designed for AI agents, usecomputer enables programmatic control of macOS, Linux (X11), and Windows desktops. It provides essential functionalities like taking screenshots, controlling the mouse (move, click, drag, scroll), and synthesizing keyboard input, all executed via a Zig binary for high performance without a Node.js runtime. This tool is invaluable for AI agents that need to interact with graphical user interfaces to perform complex tasks.

How It Works

The core of usecomputer is a native Zig binary, leveraging N-API for its command-line interface. This approach bypasses the need for a Node.js runtime, ensuring efficient execution. It exposes granular control over desktop interactions, including precise mouse movements and clicks, keyboard input, and comprehensive screenshot capabilities. A key design choice is its screenshot scaling and coordinate mapping (coord-map) system, which normalizes image sizes and translates screenshot-relative coordinates back to absolute screen coordinates, facilitating reliable AI-driven UI interactions.

Quick Start & Requirements

  • Install: npm install -g usecomputer
  • AI Agent Skill: npx skills add remorses/usecomputer
  • Requirements:
    • macOS: Accessibility permission enabled for the terminal application.
    • Linux: X11 session with DISPLAY set (Wayland via XWayland is supported).
    • Windows: Must run in an interactive desktop session; automation input is blocked on locked desktops.

Highlighted Details

  • Native Performance: Implemented in Zig for speed and efficiency, with no Node.js dependency.
  • AI Agent Focus: Includes an AI agent skill and supports the screenshot → act → screenshot feedback loop, with features like window-scoped screenshots for improved model accuracy.
  • Precise Coordinate Mapping: Utilizes a coord-map system to accurately translate coordinates from screenshots to real screen positions, essential for reliable UI interaction.
  • Inline Screenshots: Supports the Kitty Graphics Protocol via the AGENT_GRAPHICS environment variable for direct image output to stdout, streamlining integration with AI agents.
  • Advanced Input: Offers sophisticated drag commands with quadratic Bezier curves for natural mouse movements and supports multi-line text input via stdin.

Licensing & Compatibility

The provided README does not specify a software license. This absence makes it difficult to assess compatibility for commercial use or closed-source linking without further clarification.

Limitations & Caveats

Desktop automation input is blocked on locked Windows sessions. The README does not explicitly detail any alpha/beta status, but its focus on integration examples suggests a stable, production-ready state for its core features.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
3
Star History
97 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.