Open-AutoGLM  by zai-org

AI phone agent framework for automated mobile task execution

Created 1 month ago
21,514 stars

Top 2.0% on SourcePulse

GitHubView on GitHub
Project Summary

An open-source framework for building AI-powered phone assistants, Open-AutoGLM enables multimodal understanding of mobile screens and task automation via natural language commands. It targets developers and researchers seeking to create intelligent agents capable of interacting with mobile applications, offering a path to unlock advanced AI functionalities on everyday smartphones. The project aims to make AI phone capabilities accessible for broader research and development.

How It Works

The system leverages Android Debug Bridge (ADB) for device control, integrating visual language models (VLMs) to interpret screen content and a planning module to generate and execute action sequences. This approach allows the agent to perceive the current UI state, understand user intent expressed in natural language, and autonomously navigate and operate applications. It supports remote ADB debugging over WiFi, enhancing flexibility for development and deployment.

Quick Start & Requirements

  • Primary Install: pip install -r requirements.txt and pip install -e .
  • Prerequisites: Python 3.10+, ADB installed and configured in PATH, Android 7.0+ device/emulator with Developer Mode and USB Debugging enabled, ADB Keyboard installed and enabled on the device.
  • Model Deployment: Requires downloading models (e.g., AutoGLM-Phone-9B) and setting up an inference server (e.g., vLLM) to expose an OpenAI-compatible API.
  • Running: Execute via command line (python main.py --base-url ...) or Python API (from phone_agent import PhoneAgent).
  • Links: Model download links (Hugging Face, ModelScope), ADB download, ADB Keyboard download.

Highlighted Details

  • Offers two primary models: AutoGLM-Phone-9B optimized for Chinese apps and AutoGLM-Phone-9B-Multilingual for broader language support.
  • Supports remote ADB debugging over WiFi/network, eliminating the need for USB connections.
  • Includes support for over 50 mainstream Chinese applications across various categories, with a command to list all supported apps.
  • Provides a comprehensive set of executable actions including Launch, Tap, Type, Swipe, Back, Home, Long Press, Double Tap, Wait, and Take_over for manual intervention.

Maintenance & Community

The provided README does not detail specific maintenance contributors, community channels (like Discord or Slack), or a public roadmap. A WeChat community is mentioned, but no direct link is supplied.

Licensing & Compatibility

The project is explicitly stated to be for "research and learning use only" and prohibits illegal activities. A "Terms of Use" document is referenced, suggesting restrictive licensing. Commercial use or integration into closed-source products is likely not permitted without explicit authorization.

Limitations & Caveats

The project is strictly intended for research and learning purposes, with prohibitions against illegal use. The agent may request manual takeover for sensitive operations like logins or payment screens. Potential issues include Windows encoding errors requiring PYTHONIOENCODING=utf-8 and interactive mode limitations in non-TTY environments.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
56
Issues (30d)
113
Star History
9,439 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.