Windows-MCP.Net  by shuyu-labs

.NET server for AI-driven Windows desktop automation

Created 6 months ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Windows-MCP.Net provides a .NET-based server implementing the Model Context Protocol (MCP) for AI assistants to interact with the Windows desktop environment. It targets developers and power users seeking to automate complex desktop tasks, offering a robust bridge between AI models and the Windows OS for enhanced productivity and application control.

How It Works

This project acts as an MCP server, leveraging .NET 10.0 to expose a comprehensive suite of tools for Windows automation. It translates AI commands into direct OS interactions, covering application launching, UI manipulation, file system operations, OCR, and system controls. The design prioritizes a structured, programmatic interface for AI agents, enabling sophisticated, context-aware desktop automation beyond simple scripting.

Quick Start & Requirements

  • Prerequisites: Windows operating system, .NET 10.0 Runtime or higher.
  • Installation: Global installation via dotnet tool install --global WindowsMCP.Net is recommended. Alternatively, it can be run directly from source code for development.
  • Configuration: Requires adding the server configuration to an MCP client, with specific JSON setups provided for both global and source-based execution.
  • Links: .NET 10 download page.

Highlighted Details

  • Extensive toolset includes application launching, PowerShell integration, desktop state capture, clipboard, mouse/keyboard operations, window management, web scraping, browser control, screenshots, file system operations, OCR, and system controls (brightness, volume, resolution).
  • Advanced UI element identification supports finding elements by text, class name, or automation ID, with mechanisms for waiting for elements.
  • Comprehensive file system management covers reading, writing, copying, moving, deleting, listing, searching files, and creating/deleting directories.
  • OCR capabilities allow text extraction from screen regions or full screens, text finding, and coordinate retrieval.

Maintenance & Community

The project outlines contribution guidelines and a roadmap with phased development plans. Specific community links (e.g., Discord, Slack) or notable contributors are not detailed in the provided README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Compatible with commercial use under the MIT license terms. Requires Windows OS and .NET 10.0.

Limitations & Caveats

The project necessitates .NET 10.0 and appropriate Windows permissions for its automation functions. Advanced features like enhanced UI recognition, multi-language OCR, and multimedia processing are planned for future development phases. Users must adhere to relevant laws and software agreements, as developers disclaim liability for misuse.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Abubakar Abid Abubakar Abid(Cofounder of Gradio), and
3 more.

owl by camel-ai

0.2%
19k
Multi-agent framework for real-world task automation
Created 11 months ago
Updated 1 day ago
Feedback? Help us improve.