Windows-Use by CursorTouch

AI agent for Windows GUI automation

Created 6 months ago

1,538 stars

Top 26.7% on SourcePulse

Project Summary

Windows-Use is an open-source automation agent designed to enable AI models to interact directly with the Windows graphical user interface (GUI). It bridges the gap between Large Language Models (LLMs) and the Windows operating system, allowing AI agents to perform tasks like opening applications, clicking buttons, typing text, and capturing UI state without relying solely on traditional computer vision. This empowers developers to integrate sophisticated automation capabilities into AI-driven applications running on Windows.

How It Works

The core approach involves the agent interacting directly with the Windows GUI layer. This method bypasses the need for complex computer vision pipelines typically used for UI automation, abstracting the OS interaction layer. This design allows any LLM to leverage the agent's capabilities for automation, promoting flexibility and reducing the dependency on specialized vision models for task execution.

Quick Start & Requirements

Primary install command: uv pip install windows-use or pip install windows-use.
Prerequisites: Python 3.12 or higher. Example usage requires langchain_google_genai and a compatible LLM (e.g., Gemini 2.0 Flash), along with the Chrome browser.
Links: The primary source of information and code is the GitHub repository: https://github.com/CursorTouch/Windows-Use

Highlighted Details

Direct GUI Interaction: Enables AI agents to perform low-level GUI operations such as clicking, typing, and executing shell commands directly within the Windows environment.
LLM Agnostic Automation: Designed to abstract OS interactions, allowing diverse LLMs to control the Windows OS without requiring model-specific UI automation components.
UI State Capture: Capable of capturing the current state of the user interface for context-aware decision-making by the AI agent.
Demo Examples: Demonstrates practical use cases including writing notes and saving them, and switching system appearance modes (e.g., dark to light).

Maintenance & Community

Developed by Jeomon George.
Contribution guidelines are available in a CONTRIBUTING file within the repository. No other community channels (e.g., Discord, Slack) or roadmap details are specified in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: The permissive MIT license generally allows for commercial use and integration within closed-source projects without significant restrictions.

Limitations & Caveats

Direct OS Manipulation Risk: The agent directly manipulates the Windows GUI, which carries a risk of unintended system behavior or data modification if errors occur.
Sandbox Recommendation: Due to the potential for instability and undesired system changes, running the agent within a sandboxed environment is strongly recommended.
Vision Component: Although it aims to reduce reliance on traditional computer vision, the use_vision=True parameter in example code suggests that visual processing capabilities may still be utilized or configurable.

Health Check

Last Commit

23 hours ago

Responsiveness

Inactive

Pull Requests (30d)

3

Issues (30d)

1

Star History

89 stars in the last 30 days

Explore Similar Projects

Awesome-GUI-Agents by ZJU-REAL

A curated collection for developing advanced GUI agents

Created 9 months ago

Updated 1 day ago

MCPControl by claude-did-this

Control Windows desktop via AI

Created 1 year ago

Updated 1 month ago

Starred by

Rodrigo Nader

Rodrigo Nader(Cofounder of Langflow) and

Harrison Chase

Harrison Chase(Founder of LangChain).

Clevrr-Computer by Clevrr-AI

Automation agent for precise system actions

Created 1 year ago

Updated 1 year ago

Starred by

Amanpreet Singh

Amanpreet Singh(Cofounder of Contextual AI),

Magnus Müller

Magnus Müller(Cofounder of Browser Use), and

1 more.

vision-agent by askui

Python tool for AI-driven desktop, mobile, and HMI automation

Created 1 year ago

Updated 2 days ago

computer-agent by suitedaces

Desktop app for AI computer control via Claude API

Created 1 year ago

Updated 2 days ago

Peekaboo by steipete

macOS GUI automation and screenshot analysis tool

Created 7 months ago

Updated 1 week ago

ScaleCUA by OpenGVLab

Cross-platform computer use agents for GUI automation

Created 4 months ago

Updated 4 days ago

g3 by dhanji

Rust AI agent for code generation and task automation

Created 3 months ago

Updated 1 day ago

Starred by

Chris Tsang

Chris Tsang(Founder of SeaQL).

terminator by mediar-ai

AI SDK for Windows GUI automation (Playwright-like API)

Created 9 months ago

Updated 1 day ago

pywinassistant by a-real-ai

Computer-Using-Agent for Windows GUI automation via natural language

Created 2 years ago

Updated 11 months ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Travis Fischer

Travis Fischer(Founder of Agentic), and

1 more.

UFO by microsoft

Desktop AgentOS for automating Windows workflows via natural language

Created 2 years ago

Updated 5 days ago

Starred by

Jason Huggins

Jason Huggins(Creator of Selenium),

Eric Zhu

Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), and

3 more.

UI-TARS-desktop by bytedance

GUI agent app for computer control via natural language

Created 11 months ago

Updated 6 days ago

Feedback? Help us improve.