GUI agent for Android app operation
Top 39.5% on sourcepulse
AgentCPM-GUI is an on-device LLM agent designed for autonomous operation of Android applications, targeting developers and researchers in mobile AI and robotics. It enhances reasoning and task execution by processing smartphone screenshots and leveraging reinforcement fine-tuning (RFT).
How It Works
Built upon the 8-billion parameter MiniCPM-V model, AgentCPM-GUI processes smartphone screenshots to understand GUI elements and layouts. It excels in GUI grounding through pre-training on a large-scale bilingual Android dataset. Reinforcement fine-tuning (RFT) enables a "think-before-acting" approach, improving success rates on complex tasks. The model utilizes a compact action space and JSON format for efficient on-device inference.
Quick Start & Requirements
conda create -n gui_agent python=3.11
), activate it (conda activate gui_agent
), and install dependencies (pip install -r requirements.txt
).model/AgentCPM-GUI
directory.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The model's performance is benchmarked on specific datasets; real-world performance may vary. The README does not detail specific hardware requirements beyond CUDA, nor does it mention potential limitations regarding screen resolutions or Android versions.
1 month ago
Inactive