Computer-Using-Agent for Windows GUI automation via natural language
Top 31.2% on sourcepulse
PyWinAssistant is an open-source framework for automating Windows GUIs using natural language. It targets users seeking to automate complex workflows without coding, offering a generalist agentic approach that prioritizes symbolic reasoning and OS-native interaction over traditional computer vision. The primary benefit is enabling intuitive, natural language control of desktop applications, making automation accessible and robust against UI changes.
How It Works
PyWinAssistant operates as a Computer-Using-Agent by leveraging Windows UI Automation (UIA) APIs to interact with GUI elements directly through their semantic properties and hierarchical relationships. This "image-free" approach bypasses the need for OCR or pixel-level analysis, enabling efficient and reliable automation. It employs Visualization-of-Thought (VoT) and Chain-of-Thought (CoT) reasoning to understand user intent, plan actions, and simulate synthetic Human-Interface-Device (HID) interactions, allowing for cross-application state awareness and self-healing workflows.
Quick Start & Requirements
pip install -r .\requirements.txt
cd .\core
then python ./assistant.py
/core/core_api.py
and /core/core_imaging.py
.Highlighted Details
Maintenance & Community
The project was publicly released on December 31, 2023, and notes indicate it was being updated as of early 2024. Community links (Discord/Slack) or specific contributor details are not provided in the README.
Licensing & Compatibility
Licensed under MIT. This license is permissive and generally compatible with commercial use and closed-source linking.
Limitations & Caveats
The system's performance is dependent on the underlying LLM's intelligence and inference speed. Some advanced features like in-step modifiers and memory-content retrieval were intentionally disabled to comply with AI ethics standards. Certain specific app interactions (e.g., sending mail with specific tab navigation) may require updates to the semantic map.
5 months ago
1 day