UFO  by microsoft

Desktop AgentOS for automating Windows workflows via natural language

Created 1 year ago
7,610 stars

Top 6.8% on SourcePulse

GitHubView on GitHub
Project Summary

UFO² (Desktop AgentOS) aims to automate complex, multi-application workflows on Windows using natural language. It targets power users and developers seeking to streamline repetitive tasks by leveraging AI agents that can interact with both graphical user interfaces and native APIs. The primary benefit is robust, intelligent automation that goes beyond simple UI scripting.

How It Works

UFO² employs a multi-agent architecture, with a HostAgent orchestrating specialized AppAgents. Each AppAgent utilizes a ReAct loop, multimodal perception, and a "Knowledge Substrate" for retrieval-augmented generation (RAG) from diverse sources like documentation, web searches, and execution traces. A key innovation is the "Speculative Executor," which reduces LLM latency by batching and validating predicted actions against live UI states. It also features hybrid control detection, combining UI Automation (UIA) with visual analysis for broader compatibility.

Quick Start & Requirements

  • Installation: Requires Python 3.10+ on Windows 10+. Install via pip install -r requirements.txt after cloning the repository.
  • Prerequisites: Windows 10+, Python 3.10/3.11. LLM API keys (OpenAI, Azure OpenAI, Qwen, Gemini) are required for agent functionality. Optional RAG configuration for enhanced knowledge retrieval.
  • Setup: Estimated setup time involves cloning, installing dependencies, and configuring LLM API details in ufo/config/config.yaml.
  • Documentation: https://microsoft.github.io/UFO/

Highlighted Details

  • Benchmarked on Windows Agent Arena (154 tasks) and OSWorld (49 tasks).
  • Claims up to 51% fewer LLM queries via speculative multi-action execution.
  • Integrates UIA, Win32, and WinCOM for deep OS control.
  • Future "Picture-in-Picture" mode isolates agent activity in a virtual desktop.

Maintenance & Community

  • Project is actively developed by Microsoft, with version v2.0.0 released April 2025.
  • Contact: ufo-agent@microsoft.com, GitHub Issues preferred.
  • Roadmap includes Picture-in-Picture, AgentOS-as-a-Service, and Auto-Debugging.

Licensing & Compatibility

  • Released under the MIT License.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The "Picture-in-Picture Desktop" feature is marked as "coming soon" and not yet available. The project disclaimer notes specific terms and conditions regarding functionality and data handling.

Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
67 stars in the last 30 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

TagUI by aisingapore

0.1%
6k
Free RPA tool for automating repetitive tasks on websites, desktop apps, and command lines
Created 8 years ago
Updated 6 months ago
Feedback? Help us improve.