OpenPhone  by HKUDS

Mobile AI agents for on-device smartphone interaction

Created 2 months ago
450 stars

Top 66.8% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> OpenPhone tackles the challenges of deploying AI agents on smartphones by introducing a compact, 3B-parameter, on-device vision-language foundation model. It targets developers and researchers seeking privacy-preserving, low-latency, and cost-free mobile AI solutions, enabling agentic capabilities directly on smartphones without cloud dependency.

How It Works

OpenPhone-3B is a vision-language model engineered for edge devices, striking a balance between capability and deployability on mobile NPUs and consumer GPUs. Its core innovation lies in a novel two-stage training approach combining Supervised Fine-Tuning (SFT) with GRPO-style Reinforcement Learning, utilizing synthetic GUI data. This methodology allows the 3B model to achieve performance comparable to larger 7B-9B models, offering significant speed and power efficiency advantages crucial for mobile environments.

Quick Start & Requirements

Evaluation primarily uses the AndroidLab benchmark framework. Recommended setup involves AVD on Mac (arm64). Model deployment and inference leverage pre-configured vLLM scripts. API setup for cloud model credentials is required in evaluation scripts. Detailed guides for model training and data generation are available separately.

Highlighted Details

  • Achieves performance comparable to 7B-9B models with a 3B parameter footprint.
  • Features a device-cloud collaboration framework for dynamic task orchestration and cost-performance optimization.
  • Employs LLM-powered evaluation, enhancing accuracy over traditional rule-based methods.
  • Demonstrates significant inference speed advantages over larger models on constrained hardware.

Maintenance & Community

The project acknowledges related open-source contributions (AndroidLab, R1-V, LLaMA Factory) but does not detail specific maintainers, community channels, or a roadmap within the provided text.

Licensing & Compatibility

This project is released under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

Cloud models still handle approximately 65% of execution steps for complex reasoning. Batch testing scripts require manual transfer of generated evaluation files to prevent path conflicts. API setup for evaluation is currently manual.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
409 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.