AppAgentX by Westlake-AGI-Lab

GUI agent research paper implementation

Created 10 months ago

584 stars

Top 55.5% on SourcePulse

Project Summary

AppAgentX introduces an evolutionary framework for GUI agents, aiming to improve the efficiency of LLM-based smartphone user agents without sacrificing intelligence or flexibility. It targets researchers and developers building LLM agents for mobile applications, offering a novel approach to automate complex tasks by learning and replacing repetitive low-level actions with high-level shortcuts.

How It Works

AppAgentX employs a memory mechanism to store task execution history. By analyzing this history, the agent identifies and learns to replace sequences of repetitive low-level operations with high-level "shortcut" actions. This evolutionary process enhances operational efficiency, allowing the agent to focus its reasoning capabilities on novel or complex tasks while streamlining routine operations. The framework leverages LangChain and LangGraph for agent construction, Neo4j for memory storage, and Pinecone for vector storage.

Quick Start & Requirements

Installation: pip install -r requirements.txt
Prerequisites:
- Python (version not specified, but LangChain/LangGraph imply recent versions)
- Neo4j database
- Pinecone API keys
- Docker with GPU support for screen recognition/feature extraction services.
- Android Debug Bridge (ADB) configured for device connection or an Android emulator via Android Studio.
Launch Demo: python demo.py or gradio demo.py
Links: Neo4j, Pinecone, LangChain, LangGraph, ADB, Android Studio Emulator

Highlighted Details

Evolutionary framework for GUI agents to improve efficiency.
Memory mechanism for learning and creating high-level action shortcuts.
Modular design for screen recognition and feature extraction services.
Supports interaction with physical Android devices or emulators.

Maintenance & Community

The project is associated with Westlake University's AGI Lab. Contact information for Wenjia Jiang is provided. No specific community channels (Discord/Slack) or roadmap links are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. It mentions the code will be open-sourced, but specific terms, restrictions, or compatibility for commercial use are not detailed.

Limitations & Caveats

The project is presented as an official implementation of a research paper, suggesting it may be experimental. Docker with GPU support is required for backend services, which could be a barrier for some users. Specific Python version requirements are not detailed.

AppAgentX by Westlake-AGI-Lab

Explore Similar Projects

Awesome-GUI-Agents by ZJU-REAL

factory by Factory-AI

rooroo by marv1nnnnn

xai-gpt-agent-toolkit by XpressAI

experts by metaskills

smartgpt by Cormanz

langmem by langchain-ai

agentic-project-management by sdi2200262

OS-Copilot by OS-Copilot

UI-TARS by bytedance

XAgent by OpenBMB

joyagent-jdgenie by jd-opensource