PhoneDriver  by OminousIndustries

Vision-powered Android phone automation agent

Created 5 months ago
1,473 stars

Top 27.4% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> PhoneDriver is a Python-based agent designed for automating Android device interactions. It leverages Qwen3-VL vision-language models to visually interpret device screens and control them via ADB commands. This enables users to perform complex tasks by describing them in plain English, offering a powerful tool for researchers and power users seeking advanced mobile automation capabilities.

How It Works

The system captures screenshots of the Android device using ADB. These images are then processed by the Qwen3-VL model, which analyzes the visual layout and UI elements. Based on this analysis and the user's natural language instruction, the agent formulates a sequence of ADB commands (taps, swipes, text input) to execute the desired action. This cycle repeats until the task is completed, providing a dynamic, vision-driven automation approach.

Quick Start & Requirements

  • Installation: Clone the repository, create and activate a Python virtual environment, and install dependencies using pip install git+https://github.com/huggingface/transformers and pip install transformers==4.57.0 pillow gradio qwen_vl_utils requests. ADB must be installed separately and the Android device connected with USB debugging enabled.
  • Prerequisites: Python 3.10+, ADB, an Android device, and a GPU with substantial VRAM (24GB tested for Qwen3-VL-8B).
  • Usage: Launch the interactive Gradio Web UI with python ui.py or execute tasks via the command line using python phone_agent.py "your task here".
  • Links: GitHub Repository

Highlighted Details

  • Vision-Powered Automation: Utilizes Qwen3-VL for sophisticated visual understanding of Android interfaces.
  • Natural Language Interaction: Allows users to specify tasks using plain English commands.
  • ADB Integration: Direct control over Android devices through the Android Debug Bridge.
  • Web UI: Features an integrated Gradio interface for user-friendly control and monitoring.
  • Real-time Feedback: Provides live screenshots and execution logs during operation.

Maintenance & Community

The provided README does not contain specific details regarding maintainers, community channels (e.g., Discord, Slack), sponsorships, or a public roadmap.

Licensing & Compatibility

The project is licensed under the Apache License 2.0. This license is generally permissive for commercial use and integration into closed-source projects, though users should consult the full license text for specific terms.

Limitations & Caveats

The project has significant hardware requirements, notably a GPU with ample VRAM (24GB recommended for the 8B model), which may be a barrier to entry. Installation requires specific versions of libraries, including a potentially unreleased version of transformers (4.57.0), which could lead to setup instability. Users may encounter issues with tap accuracy, necessitating manual configuration of screen resolution.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
115 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Gregor Zunic Gregor Zunic(Cofounder of Browser Use).

droidrun by droidrun

0.6%
8k
Framework for controlling Android devices via LLM agents
Created 1 year ago
Updated 4 days ago
Feedback? Help us improve.