MAI-UI  by Tongyi-MAI

Foundation GUI agents for real-world interaction

Created 3 weeks ago

New!

1,396 stars

Top 28.8% on SourcePulse

GitHubView on GitHub
Project Summary

MAI-UI presents a family of foundation GUI agents designed to revolutionize human-computer interaction by enabling realistic deployment. It addresses key challenges like native interaction, UI-only limitations, deployment architecture, and dynamic environments, offering state-of-the-art performance for researchers and engineers in GUI grounding and mobile navigation.

How It Works

MAI-UI employs a unified methodology featuring a self-evolving data pipeline that integrates user interaction and tool calls. Its core innovation lies in a native device-cloud collaboration system that intelligently routes execution based on task state, complemented by an online Reinforcement Learning framework optimized for scaling parallel environments and context length. This approach enhances efficiency and adaptability in complex GUI interactions.

Quick Start & Requirements

Clone the repository (git clone https://github.com/Tongyi-MAI/MAI-UI.git) and navigate into the directory. Install vllm (>=0.11.0) and transformers (>=4.57.0), then install project dependencies (pip install -r requirements.txt). Serve models (e.g., MAI-UI-8B from HuggingFace) using vLLM's OpenAI-compatible API server. Execute demo notebooks (cookbook/grounding.ipynb, cookbook/run_agent.ipynb) after updating the API endpoint configuration. GPU is recommended for model serving.

Highlighted Details

  • Achieves new state-of-the-art (SOTA) on GUI grounding benchmarks, including 73.5% on ScreenSpot-Pro and 91.3% on MMBench GUI L2.
  • Sets new SOTA for mobile GUI navigation with 76.7% on AndroidWorld and 41.7% on MobileWorld.
  • Device-cloud collaboration boosts on-device performance by 33% while reducing cloud API calls by over 40%.
  • Online RL framework demonstrates significant gains through scaling parallel environments and increasing step budgets.

Maintenance & Community

Primary contact points are Hanzhang Zhou, Xu Zhang, and Yue Wang via provided email addresses. The repository does not currently list community channels like Discord/Slack or a public roadmap.

Licensing & Compatibility

MAI-UI is licensed under the Apache License (Version 2.0), which generally permits commercial use and integration into closed-source projects. A NOTICE file details other third-party components with potentially different licenses.

Limitations & Caveats

The provided README focuses on the project's capabilities and performance achievements. It does not explicitly detail known limitations, bugs, or the project's current development stage (e.g., alpha/beta).

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
29
Star History
1,409 stars in the last 27 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

SuperAGI by TransformerOptimus

0.2%
17k
Open-source framework for autonomous AI agent development
Created 2 years ago
Updated 11 months ago
Feedback? Help us improve.