Discover and explore top open-source AI tools and projects—updated daily.
OminousIndustriesVision-powered Android phone automation agent
Top 27.4% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> PhoneDriver is a Python-based agent designed for automating Android device interactions. It leverages Qwen3-VL vision-language models to visually interpret device screens and control them via ADB commands. This enables users to perform complex tasks by describing them in plain English, offering a powerful tool for researchers and power users seeking advanced mobile automation capabilities.
How It Works
The system captures screenshots of the Android device using ADB. These images are then processed by the Qwen3-VL model, which analyzes the visual layout and UI elements. Based on this analysis and the user's natural language instruction, the agent formulates a sequence of ADB commands (taps, swipes, text input) to execute the desired action. This cycle repeats until the task is completed, providing a dynamic, vision-driven automation approach.
Quick Start & Requirements
pip install git+https://github.com/huggingface/transformers and pip install transformers==4.57.0 pillow gradio qwen_vl_utils requests. ADB must be installed separately and the Android device connected with USB debugging enabled.python ui.py or execute tasks via the command line using python phone_agent.py "your task here".Highlighted Details
Maintenance & Community
The provided README does not contain specific details regarding maintainers, community channels (e.g., Discord, Slack), sponsorships, or a public roadmap.
Licensing & Compatibility
The project is licensed under the Apache License 2.0. This license is generally permissive for commercial use and integration into closed-source projects, though users should consult the full license text for specific terms.
Limitations & Caveats
The project has significant hardware requirements, notably a GPU with ample VRAM (24GB recommended for the 8B model), which may be a barrier to entry. Installation requires specific versions of libraries, including a potentially unreleased version of transformers (4.57.0), which could lead to setup instability. Users may encounter issues with tap accuracy, necessitating manual configuration of screen resolution.
5 months ago
Inactive
askui
droidrun