Discover and explore top open-source AI tools and projects—updated daily.
zai-orgAI phone agent framework for automated mobile task execution
Top 2.0% on SourcePulse
An open-source framework for building AI-powered phone assistants, Open-AutoGLM enables multimodal understanding of mobile screens and task automation via natural language commands. It targets developers and researchers seeking to create intelligent agents capable of interacting with mobile applications, offering a path to unlock advanced AI functionalities on everyday smartphones. The project aims to make AI phone capabilities accessible for broader research and development.
How It Works
The system leverages Android Debug Bridge (ADB) for device control, integrating visual language models (VLMs) to interpret screen content and a planning module to generate and execute action sequences. This approach allows the agent to perceive the current UI state, understand user intent expressed in natural language, and autonomously navigate and operate applications. It supports remote ADB debugging over WiFi, enhancing flexibility for development and deployment.
Quick Start & Requirements
pip install -r requirements.txt and pip install -e .AutoGLM-Phone-9B) and setting up an inference server (e.g., vLLM) to expose an OpenAI-compatible API.python main.py --base-url ...) or Python API (from phone_agent import PhoneAgent).Highlighted Details
AutoGLM-Phone-9B optimized for Chinese apps and AutoGLM-Phone-9B-Multilingual for broader language support.Launch, Tap, Type, Swipe, Back, Home, Long Press, Double Tap, Wait, and Take_over for manual intervention.Maintenance & Community
The provided README does not detail specific maintenance contributors, community channels (like Discord or Slack), or a public roadmap. A WeChat community is mentioned, but no direct link is supplied.
Licensing & Compatibility
The project is explicitly stated to be for "research and learning use only" and prohibits illegal activities. A "Terms of Use" document is referenced, suggesting restrictive licensing. Commercial use or integration into closed-source products is likely not permitted without explicit authorization.
Limitations & Caveats
The project is strictly intended for research and learning purposes, with prohibitions against illegal use. The agent may request manual takeover for sensitive operations like logins or payment screens. Potential issues include Windows encoding errors requiring PYTHONIOENCODING=utf-8 and interactive mode limitations in non-TTY environments.
6 days ago
Inactive