UFO  by microsoft

Desktop AgentOS for automating Windows workflows via natural language

created 1 year ago
7,513 stars

Top 7.1% on sourcepulse

GitHubView on GitHub
Project Summary

UFO² (Desktop AgentOS) aims to automate complex, multi-application workflows on Windows using natural language. It targets power users and developers seeking to streamline repetitive tasks by leveraging AI agents that can interact with both graphical user interfaces and native APIs. The primary benefit is robust, intelligent automation that goes beyond simple UI scripting.

How It Works

UFO² employs a multi-agent architecture, with a HostAgent orchestrating specialized AppAgents. Each AppAgent utilizes a ReAct loop, multimodal perception, and a "Knowledge Substrate" for retrieval-augmented generation (RAG) from diverse sources like documentation, web searches, and execution traces. A key innovation is the "Speculative Executor," which reduces LLM latency by batching and validating predicted actions against live UI states. It also features hybrid control detection, combining UI Automation (UIA) with visual analysis for broader compatibility.

Quick Start & Requirements

  • Installation: Requires Python 3.10+ on Windows 10+. Install via pip install -r requirements.txt after cloning the repository.
  • Prerequisites: Windows 10+, Python 3.10/3.11. LLM API keys (OpenAI, Azure OpenAI, Qwen, Gemini) are required for agent functionality. Optional RAG configuration for enhanced knowledge retrieval.
  • Setup: Estimated setup time involves cloning, installing dependencies, and configuring LLM API details in ufo/config/config.yaml.
  • Documentation: https://microsoft.github.io/UFO/

Highlighted Details

  • Benchmarked on Windows Agent Arena (154 tasks) and OSWorld (49 tasks).
  • Claims up to 51% fewer LLM queries via speculative multi-action execution.
  • Integrates UIA, Win32, and WinCOM for deep OS control.
  • Future "Picture-in-Picture" mode isolates agent activity in a virtual desktop.

Maintenance & Community

  • Project is actively developed by Microsoft, with version v2.0.0 released April 2025.
  • Contact: ufo-agent@microsoft.com, GitHub Issues preferred.
  • Roadmap includes Picture-in-Picture, AgentOS-as-a-Service, and Auto-Debugging.

Licensing & Compatibility

  • Released under the MIT License.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The "Picture-in-Picture Desktop" feature is marked as "coming soon" and not yet available. The project disclaimer notes specific terms and conditions regarding functionality and data handling.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
558 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Toran Bruce Richards Toran Bruce Richards(Founder of AutoGPT), and
2 more.

OS-Copilot by OS-Copilot

0.1%
2k
OS agent for automating daily tasks
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Robert Stojnic Robert Stojnic(Creator of Papers with Code).

Agent-S by simular-ai

1.2%
6k
Agentic framework for autonomous computer interaction
created 9 months ago
updated 17 hours ago
Feedback? Help us improve.