AppAgent  by TencentQQGYLab

LLM-based multimodal agent for smartphone app operation

created 1 year ago
6,050 stars

Top 8.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

AppAgent is an LLM-based multimodal agent framework that enables AI to operate smartphone applications by mimicking human touch and swipe interactions. It targets researchers and developers looking to build sophisticated mobile automation tools, offering a novel approach that bypasses the need for direct app backend access.

How It Works

AppAgent utilizes a two-phase process: exploration and deployment. During exploration, the agent autonomously navigates an app or learns from human demonstrations to build a knowledge base of UI elements and their functions. This knowledge base, consisting of documented interactions, is then used by the agent in the deployment phase to execute user-defined tasks on the smartphone. This method allows for generalization across different apps and tasks without requiring explicit API integrations.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Android Debug Bridge (adb), Android device with USB debugging enabled or Android Studio emulator, Python 3.
  • Configuration: Requires OpenAI API key (GPT-4V) or Alibaba Cloud Dashscope API key (Qwen-VL-Max). GPT-4V usage incurs costs.
  • Resources: Setup involves cloning the repo, installing dependencies, and configuring API keys.
  • Docs: Quick Start, Demo

Highlighted Details

  • Supports autonomous exploration or learning from human demonstrations.
  • Offers a grid overlay for precise UI element interaction.
  • Includes an evaluation benchmark for performance assessment.
  • Recently released AppAgentX with an evolving mechanism.

Maintenance & Community

The project is actively maintained by TencentQQGYLab, with recent updates including AppAgentX and alternative model support. Contact is available via GitHub Issues or email for support.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

Performance is dependent on the chosen multimodal model (GPT-4V recommended over Qwen-VL-Max). Manual revision of generated documentation may be necessary for optimal performance.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
297 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Robert Stojnic Robert Stojnic(Creator of Papers with Code).

Agent-S by simular-ai

1.2%
6k
Agentic framework for autonomous computer interaction
created 9 months ago
updated 1 day ago
Feedback? Help us improve.