MobiAgent by IPADS-SAI

Customizable mobile agents for intelligent GUI automation

Created 6 months ago

1,740 stars

Top 24.1% on SourcePulse

Project Summary

MobiAgent provides a systematic framework for customizable mobile agents, addressing the need for intelligent automation on mobile devices. It targets researchers and developers seeking to build, evaluate, and deploy sophisticated agents capable of complex task execution on smartphones. The system offers an agent model family (MobiMind), an acceleration framework (AgentRR), and an evaluation benchmark (MobiFlow), enabling enhanced mobile agent capabilities and systematic performance assessment.

How It Works

MobiAgent employs a modular architecture comprising an agent model family (MobiMind) for decision-making and grounding, an agent record and replay framework (AgentRR) for efficient task execution and debugging, and a benchmark suite (MobiFlow) for systematic evaluation. The system leverages Android Debug Bridge (ADB) to interact with mobile devices, orchestrating agent execution through a runner component. Recent enhancements include a local experience retrieval module for improved task planning and a mixed-version MobiMind model supporting both Decider and Grounder tasks.

Quick Start & Requirements

Installation: pip install -r requirements_simple.txt for basic functionality, or pip install -r requirements.txt for the full pipeline.
Prerequisites: Python 3.10, Android Debug Bridge (ADB), ADBKeyboard app on Android device, enabled Developer Options and USB debugging. GPU acceleration for OCR requires paddlepaddle-gpu compatible with specific CUDA versions (e.g., CUDA 11.8).
Model Weights: Download OmniParser model weights and deploy Decider, Grounder, and Planner models using vLLM.
Links: Paper: [arXiv link provided in text], Huggingface: [Link provided in text], App: [Link provided in text].

Highlighted Details

Recent updates include a local experience retrieval module (Sept 2025) and an open-sourced mixed MobiMind model (Sept 2025).
System architecture includes agent_rr/ (Record & Replay), collect/ (Data tools), runner/ (Agent executor), MobiFlow/ (Benchmark), and app/ (Android app).
Demos available for the Mobile App and AgentRR (task execution replay).
Supports GPU acceleration for OCR via PaddlePaddle.

Maintenance & Community

The project acknowledges support from the National Innovation Institute of High-end Smart Appliances. Specific community channels (Discord, Slack) or roadmap links are not provided in the README.

Licensing & Compatibility

The license type and compatibility for commercial or closed-source use are not specified in the provided text.

Limitations & Caveats

The setup for the full pipeline involves significant dependencies, including potentially heavy libraries like PyTorch and PaddlePaddle with specific CUDA requirements. Model deployment requires vLLM, adding another layer of complexity. Licensing information is absent, posing a potential adoption blocker. The primary focus is on Android mobile devices.

MobiAgent by IPADS-SAI

Explore Similar Projects

Android-Lab by THUDM

agent-studio by sxhxliang

deliteAI by NimbleEdge

WindowsAgentArena by microsoft

acu by trycua

droidclaw by unitedbyai

gelab-zero by stepfun-ai

mobile-use by minitap-ai

mobile-mcp by mobile-next

Auto-GPT-ZH by kaqijiang

MobileAgent by X-PLUG

lamda by firerpa