OpenPhone by HKUDS

Mobile AI agents for on-device smartphone interaction

Created 4 months ago

702 stars

Top 48.6% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> OpenPhone tackles the challenges of deploying AI agents on smartphones by introducing a compact, 3B-parameter, on-device vision-language foundation model. It targets developers and researchers seeking privacy-preserving, low-latency, and cost-free mobile AI solutions, enabling agentic capabilities directly on smartphones without cloud dependency.

How It Works

OpenPhone-3B is a vision-language model engineered for edge devices, striking a balance between capability and deployability on mobile NPUs and consumer GPUs. Its core innovation lies in a novel two-stage training approach combining Supervised Fine-Tuning (SFT) with GRPO-style Reinforcement Learning, utilizing synthetic GUI data. This methodology allows the 3B model to achieve performance comparable to larger 7B-9B models, offering significant speed and power efficiency advantages crucial for mobile environments.

Quick Start & Requirements

Evaluation primarily uses the AndroidLab benchmark framework. Recommended setup involves AVD on Mac (arm64). Model deployment and inference leverage pre-configured vLLM scripts. API setup for cloud model credentials is required in evaluation scripts. Detailed guides for model training and data generation are available separately.

Highlighted Details

Achieves performance comparable to 7B-9B models with a 3B parameter footprint.
Features a device-cloud collaboration framework for dynamic task orchestration and cost-performance optimization.
Employs LLM-powered evaluation, enhancing accuracy over traditional rule-based methods.
Demonstrates significant inference speed advantages over larger models on constrained hardware.

Maintenance & Community

The project acknowledges related open-source contributions (AndroidLab, R1-V, LLaMA Factory) but does not detail specific maintainers, community channels, or a roadmap within the provided text.

Licensing & Compatibility

This project is released under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

Cloud models still handle approximately 65% of execution steps for complex reasoning. Batch testing scripts require manual transfer of generated evaluation files to prevent path conflicts. API setup for evaluation is currently manual.

OpenPhone by HKUDS

Explore Similar Projects

awesome-mobile-llm by stevelaskaridis

Android-Lab by THUDM

youtu-tip by TencentCloudADP

droidclaw by unitedbyai

MAI-UI by Tongyi-MAI

gelab-zero by stepfun-ai

cookbook by Liquid4All

mobile-use by minitap-ai

MobiAgent by IPADS-SAI

mobile-mcp by mobile-next

AppAgent by TencentQQGYLab

Open-AutoGLM by zai-org