AgentCPM-GUI by OpenBMB

GUI agent for Android app operation

Created 9 months ago

1,313 stars

Top 30.1% on SourcePulse

Project Summary

AgentCPM-GUI is an on-device LLM agent designed for autonomous operation of Android applications, targeting developers and researchers in mobile AI and robotics. It enhances reasoning and task execution by processing smartphone screenshots and leveraging reinforcement fine-tuning (RFT).

How It Works

Built upon the 8-billion parameter MiniCPM-V model, AgentCPM-GUI processes smartphone screenshots to understand GUI elements and layouts. It excels in GUI grounding through pre-training on a large-scale bilingual Android dataset. Reinforcement fine-tuning (RFT) enables a "think-before-acting" approach, improving success rates on complex tasks. The model utilizes a compact action space and JSON format for efficient on-device inference.

Quick Start & Requirements

Install: Clone the repository, create a conda environment (conda create -n gui_agent python=3.11), activate it (conda activate gui_agent), and install dependencies (pip install -r requirements.txt).
Model: Download the AgentCPM-GUI model from Hugging Face and place it in the model/AgentCPM-GUI directory.
Hardware: Requires a CUDA-enabled GPU for inference.
Links: Demo Case, CAGUI Dataset

Highlighted Details

Achieves state-of-the-art performance on various grounding benchmarks, outperforming models like Qwen2.5-VL-7B and GPT-4o.
First open-source GUI agent fine-tuned for Chinese apps, supporting over 30 popular titles.
Demonstrates strong performance on both English and Chinese app control benchmarks.
Supports both Hugging Face Transformers and vLLM for inference.

Maintenance & Community

Developed by THUNLP, Renmin University of China, and ModelBest.
Evaluation data and code are open-sourced.

Licensing & Compatibility

Code is released under the Apache-2.0 license.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The model's performance is benchmarked on specific datasets; real-world performance may vary. The README does not detail specific hardware requirements beyond CUDA, nor does it mention potential limitations regarding screen resolutions or Android versions.

AgentCPM-GUI by OpenBMB

Explore Similar Projects

Awesome-GUI-Agents by ZJU-REAL

Aria-UI by AriaUI

ScaleCUA by OpenGVLab

AppAgentX by Westlake-AGI-Lab

CogAgent by zai-org

ShowUI by showlab

gelab-zero by stepfun-ai

mobile-use by minitap-ai

MobiAgent by IPADS-SAI

UI-TARS by bytedance

AppAgent by TencentQQGYLab

droidrun by droidrun