MobiAgent  by IPADS-SAI

Customizable mobile agents for intelligent GUI automation

Created 2 months ago
306 stars

Top 87.4% on SourcePulse

GitHubView on GitHub
Project Summary

MobiAgent provides a systematic framework for customizable mobile agents, addressing the need for intelligent automation on mobile devices. It targets researchers and developers seeking to build, evaluate, and deploy sophisticated agents capable of complex task execution on smartphones. The system offers an agent model family (MobiMind), an acceleration framework (AgentRR), and an evaluation benchmark (MobiFlow), enabling enhanced mobile agent capabilities and systematic performance assessment.

How It Works

MobiAgent employs a modular architecture comprising an agent model family (MobiMind) for decision-making and grounding, an agent record and replay framework (AgentRR) for efficient task execution and debugging, and a benchmark suite (MobiFlow) for systematic evaluation. The system leverages Android Debug Bridge (ADB) to interact with mobile devices, orchestrating agent execution through a runner component. Recent enhancements include a local experience retrieval module for improved task planning and a mixed-version MobiMind model supporting both Decider and Grounder tasks.

Quick Start & Requirements

  • Installation: pip install -r requirements_simple.txt for basic functionality, or pip install -r requirements.txt for the full pipeline.
  • Prerequisites: Python 3.10, Android Debug Bridge (ADB), ADBKeyboard app on Android device, enabled Developer Options and USB debugging. GPU acceleration for OCR requires paddlepaddle-gpu compatible with specific CUDA versions (e.g., CUDA 11.8).
  • Model Weights: Download OmniParser model weights and deploy Decider, Grounder, and Planner models using vLLM.
  • Links: Paper: [arXiv link provided in text], Huggingface: [Link provided in text], App: [Link provided in text].

Highlighted Details

  • Recent updates include a local experience retrieval module (Sept 2025) and an open-sourced mixed MobiMind model (Sept 2025).
  • System architecture includes agent_rr/ (Record & Replay), collect/ (Data tools), runner/ (Agent executor), MobiFlow/ (Benchmark), and app/ (Android app).
  • Demos available for the Mobile App and AgentRR (task execution replay).
  • Supports GPU acceleration for OCR via PaddlePaddle.

Maintenance & Community

The project acknowledges support from the National Innovation Institute of High-end Smart Appliances. Specific community channels (Discord, Slack) or roadmap links are not provided in the README.

Licensing & Compatibility

The license type and compatibility for commercial or closed-source use are not specified in the provided text.

Limitations & Caveats

The setup for the full pipeline involves significant dependencies, including potentially heavy libraries like PyTorch and PaddlePaddle with specific CUDA requirements. Model deployment requires vLLM, adding another layer of complexity. Licensing information is absent, posing a potential adoption blocker. The primary focus is on Android mobile devices.

Health Check
Last Commit

16 hours ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
1
Star History
28 stars in the last 30 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

SuperAGI by TransformerOptimus

0.1%
17k
Open-source framework for autonomous AI agent development
Created 2 years ago
Updated 9 months ago
Feedback? Help us improve.