pywinassistant  by a-real-ai

Computer-Using-Agent for Windows GUI automation via natural language

created 1 year ago
1,307 stars

Top 31.2% on sourcepulse

GitHubView on GitHub
Project Summary

PyWinAssistant is an open-source framework for automating Windows GUIs using natural language. It targets users seeking to automate complex workflows without coding, offering a generalist agentic approach that prioritizes symbolic reasoning and OS-native interaction over traditional computer vision. The primary benefit is enabling intuitive, natural language control of desktop applications, making automation accessible and robust against UI changes.

How It Works

PyWinAssistant operates as a Computer-Using-Agent by leveraging Windows UI Automation (UIA) APIs to interact with GUI elements directly through their semantic properties and hierarchical relationships. This "image-free" approach bypasses the need for OCR or pixel-level analysis, enabling efficient and reliable automation. It employs Visualization-of-Thought (VoT) and Chain-of-Thought (CoT) reasoning to understand user intent, plan actions, and simulate synthetic Human-Interface-Device (HID) interactions, allowing for cross-application state awareness and self-healing workflows.

Quick Start & Requirements

  • Install dependencies: pip install -r .\requirements.txt
  • Execute: cd .\core then python ./assistant.py
  • Requires OpenAI API key configured in /core/core_api.py and /core/core_imaging.py.
  • Supports Windows 10/11.

Highlighted Details

  • Bypasses traditional computer vision/OCR for GUI automation, relying solely on UIA metadata.
  • Achieves "blind operation" on headless systems or minimized windows.
  • Claims 100x efficiency gains over traditional methods due to native API access.
  • Implements self-healing workflows that adapt to UI changes.

Maintenance & Community

The project was publicly released on December 31, 2023, and notes indicate it was being updated as of early 2024. Community links (Discord/Slack) or specific contributor details are not provided in the README.

Licensing & Compatibility

Licensed under MIT. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The system's performance is dependent on the underlying LLM's intelligence and inference speed. Some advanced features like in-step modifiers and memory-content retrieval were intentionally disabled to comply with AI ethics standards. Certain specific app interactions (e.g., sending mail with specific tab navigation) may require updates to the semantic map.

Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Toran Bruce Richards Toran Bruce Richards(Founder of AutoGPT), and
2 more.

OS-Copilot by OS-Copilot

0.1%
2k
OS agent for automating daily tasks
created 1 year ago
updated 10 months ago
Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

TagUI by aisingapore

0.1%
6k
Free RPA tool for automating repetitive tasks on websites, desktop apps, and command lines
created 8 years ago
updated 5 months ago
Feedback? Help us improve.