Discover and explore top open-source AI tools and projects—updated daily.
Tongyi-MAIFoundation GUI agents for real-world interaction
New!
Top 28.8% on SourcePulse
MAI-UI presents a family of foundation GUI agents designed to revolutionize human-computer interaction by enabling realistic deployment. It addresses key challenges like native interaction, UI-only limitations, deployment architecture, and dynamic environments, offering state-of-the-art performance for researchers and engineers in GUI grounding and mobile navigation.
How It Works
MAI-UI employs a unified methodology featuring a self-evolving data pipeline that integrates user interaction and tool calls. Its core innovation lies in a native device-cloud collaboration system that intelligently routes execution based on task state, complemented by an online Reinforcement Learning framework optimized for scaling parallel environments and context length. This approach enhances efficiency and adaptability in complex GUI interactions.
Quick Start & Requirements
Clone the repository (git clone https://github.com/Tongyi-MAI/MAI-UI.git) and navigate into the directory. Install vllm (>=0.11.0) and transformers (>=4.57.0), then install project dependencies (pip install -r requirements.txt). Serve models (e.g., MAI-UI-8B from HuggingFace) using vLLM's OpenAI-compatible API server. Execute demo notebooks (cookbook/grounding.ipynb, cookbook/run_agent.ipynb) after updating the API endpoint configuration. GPU is recommended for model serving.
Highlighted Details
Maintenance & Community
Primary contact points are Hanzhang Zhou, Xu Zhang, and Yue Wang via provided email addresses. The repository does not currently list community channels like Discord/Slack or a public roadmap.
Licensing & Compatibility
MAI-UI is licensed under the Apache License (Version 2.0), which generally permits commercial use and integration into closed-source projects. A NOTICE file details other third-party components with potentially different licenses.
Limitations & Caveats
The provided README focuses on the project's capabilities and performance achievements. It does not explicitly detail known limitations, bugs, or the project's current development stage (e.g., alpha/beta).
2 days ago
Inactive
bytedance
TransformerOptimus