MAI-UI  by Tongyi-MAI

Foundation GUI agents for real-world interaction

Created 2 months ago
1,681 stars

Top 24.8% on SourcePulse

GitHubView on GitHub
Project Summary

MAI-UI presents a family of foundation GUI agents designed to revolutionize human-computer interaction by enabling realistic deployment. It addresses key challenges like native interaction, UI-only limitations, deployment architecture, and dynamic environments, offering state-of-the-art performance for researchers and engineers in GUI grounding and mobile navigation.

How It Works

MAI-UI employs a unified methodology featuring a self-evolving data pipeline that integrates user interaction and tool calls. Its core innovation lies in a native device-cloud collaboration system that intelligently routes execution based on task state, complemented by an online Reinforcement Learning framework optimized for scaling parallel environments and context length. This approach enhances efficiency and adaptability in complex GUI interactions.

Quick Start & Requirements

Clone the repository (git clone https://github.com/Tongyi-MAI/MAI-UI.git) and navigate into the directory. Install vllm (>=0.11.0) and transformers (>=4.57.0), then install project dependencies (pip install -r requirements.txt). Serve models (e.g., MAI-UI-8B from HuggingFace) using vLLM's OpenAI-compatible API server. Execute demo notebooks (cookbook/grounding.ipynb, cookbook/run_agent.ipynb) after updating the API endpoint configuration. GPU is recommended for model serving.

Highlighted Details

  • Achieves new state-of-the-art (SOTA) on GUI grounding benchmarks, including 73.5% on ScreenSpot-Pro and 91.3% on MMBench GUI L2.
  • Sets new SOTA for mobile GUI navigation with 76.7% on AndroidWorld and 41.7% on MobileWorld.
  • Device-cloud collaboration boosts on-device performance by 33% while reducing cloud API calls by over 40%.
  • Online RL framework demonstrates significant gains through scaling parallel environments and increasing step budgets.

Maintenance & Community

Primary contact points are Hanzhang Zhou, Xu Zhang, and Yue Wang via provided email addresses. The repository does not currently list community channels like Discord/Slack or a public roadmap.

Licensing & Compatibility

MAI-UI is licensed under the Apache License (Version 2.0), which generally permits commercial use and integration into closed-source projects. A NOTICE file details other third-party components with potentially different licenses.

Limitations & Caveats

The provided README focuses on the project's capabilities and performance achievements. It does not explicitly detail known limitations, bugs, or the project's current development stage (e.g., alpha/beta).

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
123 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.