LLM-powered tool to control computers via simulated input
Top 19.7% on sourcepulse
This project provides a self-driving interface for computers, enabling users to control their machines using Large Language Models (LLMs) like GPT-4o or Gemini. It's designed for users who want to automate complex tasks or interact with their computer through natural language commands, offering an "autopilot" experience across macOS, Linux, and Windows.
How It Works
The core approach involves sending user requests to an LLM backend, which breaks down the task into executable steps. The application then simulates keyboard and mouse inputs to perform these actions. To ensure accuracy and adapt to dynamic interfaces, it captures screenshots of the current progress and feeds them back to the LLM for course correction, creating a feedback loop for task completion.
Quick Start & Requirements
pip install -r requirements.txt
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The system struggles with accurate spatial reasoning, making precise clicking and interaction with tabular data (like spreadsheets) difficult. It also has limitations in navigating complex GUI-rich applications that heavily rely on cursor actions. The tool currently only processes the primary display when multiple monitors are in use.
4 months ago
1 week