AgentCPM-GUI  by OpenBMB

GUI agent for Android app operation

created 2 months ago
947 stars

Top 39.5% on sourcepulse

GitHubView on GitHub
Project Summary

AgentCPM-GUI is an on-device LLM agent designed for autonomous operation of Android applications, targeting developers and researchers in mobile AI and robotics. It enhances reasoning and task execution by processing smartphone screenshots and leveraging reinforcement fine-tuning (RFT).

How It Works

Built upon the 8-billion parameter MiniCPM-V model, AgentCPM-GUI processes smartphone screenshots to understand GUI elements and layouts. It excels in GUI grounding through pre-training on a large-scale bilingual Android dataset. Reinforcement fine-tuning (RFT) enables a "think-before-acting" approach, improving success rates on complex tasks. The model utilizes a compact action space and JSON format for efficient on-device inference.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment (conda create -n gui_agent python=3.11), activate it (conda activate gui_agent), and install dependencies (pip install -r requirements.txt).
  • Model: Download the AgentCPM-GUI model from Hugging Face and place it in the model/AgentCPM-GUI directory.
  • Hardware: Requires a CUDA-enabled GPU for inference.
  • Links: Demo Case, CAGUI Dataset

Highlighted Details

  • Achieves state-of-the-art performance on various grounding benchmarks, outperforming models like Qwen2.5-VL-7B and GPT-4o.
  • First open-source GUI agent fine-tuned for Chinese apps, supporting over 30 popular titles.
  • Demonstrates strong performance on both English and Chinese app control benchmarks.
  • Supports both Hugging Face Transformers and vLLM for inference.

Maintenance & Community

  • Developed by THUNLP, Renmin University of China, and ModelBest.
  • Evaluation data and code are open-sourced.

Licensing & Compatibility

  • Code is released under the Apache-2.0 license.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The model's performance is benchmarked on specific datasets; real-world performance may vary. The README does not detail specific hardware requirements beyond CUDA, nor does it mention potential limitations regarding screen resolutions or Android versions.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
17
Star History
958 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.