usecomputer by remorses

Desktop automation CLI for AI agents

Created 4 months ago

311 stars

Top 86.4% on SourcePulse

Project Summary

A fast, native desktop automation CLI designed for AI agents, usecomputer enables programmatic control of macOS, Linux (X11), and Windows desktops. It provides essential functionalities like taking screenshots, controlling the mouse (move, click, drag, scroll), and synthesizing keyboard input, all executed via a Zig binary for high performance without a Node.js runtime. This tool is invaluable for AI agents that need to interact with graphical user interfaces to perform complex tasks.

How It Works

The core of usecomputer is a native Zig binary, leveraging N-API for its command-line interface. This approach bypasses the need for a Node.js runtime, ensuring efficient execution. It exposes granular control over desktop interactions, including precise mouse movements and clicks, keyboard input, and comprehensive screenshot capabilities. A key design choice is its screenshot scaling and coordinate mapping (coord-map) system, which normalizes image sizes and translates screenshot-relative coordinates back to absolute screen coordinates, facilitating reliable AI-driven UI interactions.

Quick Start & Requirements

Install: npm install -g usecomputer
AI Agent Skill: npx skills add remorses/usecomputer
Requirements:
- macOS: Accessibility permission enabled for the terminal application.
- Linux: X11 session with DISPLAY set (Wayland via XWayland is supported).
- Windows: Must run in an interactive desktop session; automation input is blocked on locked desktops.

Highlighted Details

Native Performance: Implemented in Zig for speed and efficiency, with no Node.js dependency.
AI Agent Focus: Includes an AI agent skill and supports the screenshot → act → screenshot feedback loop, with features like window-scoped screenshots for improved model accuracy.
Precise Coordinate Mapping: Utilizes a coord-map system to accurately translate coordinates from screenshots to real screen positions, essential for reliable UI interaction.
Inline Screenshots: Supports the Kitty Graphics Protocol via the AGENT_GRAPHICS environment variable for direct image output to stdout, streamlining integration with AI agents.
Advanced Input: Offers sophisticated drag commands with quadratic Bezier curves for natural mouse movements and supports multi-line text input via stdin.

Licensing & Compatibility

The provided README does not specify a software license. This absence makes it difficult to assess compatibility for commercial use or closed-source linking without further clarification.

Limitations & Caveats

Desktop automation input is blocked on locked Windows sessions. The README does not explicitly detail any alpha/beta status, but its focus on integration examples suggests a stable, production-ready state for its core features.

usecomputer by remorses

Explore Similar Projects

Whimbox by nikkigallery

MCPControl by claude-did-this

python-sdk by askui

computer-agent by suitedaces

awesome-gemini-cli by Piebald-AI

agents-in-action by traversaal-ai

hello-halo by openkursar

lil-agents by ryanstephen

Peekaboo by openclaw

Open-Claude-Cowork by DevAgentForge

clawd-on-desk by rullerzhou-afk

hermes-desktop by fathah