openpocket  by pockebot

AI agents autonomously operate Android phones for local, private task automation

Created 1 month ago
826 stars

Top 42.9% on SourcePulse

GitHubView on GitHub
Project Summary

OpenPocket is an open-source framework that empowers AI agents to autonomously operate Android phones for various tasks, from messaging and social media management to gaming and payments. It targets users who desire a private, local, and always-on AI assistant for their mobile device, offering benefits such as hands-free operation, enhanced productivity, and robust data privacy by keeping all processing on the user's machine.

How It Works

OpenPocket leverages AI agents to interact with Android devices, supporting both emulators and physical phones via ADB. The core approach simulates human actions like tapping, scrolling, and typing to navigate the device and execute tasks. It prioritizes a local-first, privacy-by-default architecture, ensuring sensitive data never leaves the user's computer. The framework is designed for extensibility, allowing new capabilities to be added through simple SKILL.md files without requiring code modifications, and includes a human-authentication relay for explicit approval of sensitive actions.

Quick Start & Requirements

  • Primary install: npm install -g openpocket (recommended). Alternatively, build from source.
  • Prerequisites: Node.js (implied by npm), Android Debug Bridge (ADB) for physical device interaction.
  • Links: Website, Documentation, Quickstart, Discord, Reddit.

Highlighted Details

  • Multi-model Support: Integrates with a wide range of LLMs including OpenAI GPT-5.x, Claude 4.6, Gemini 3.x, DeepSeek, Qwen, and more.
  • Multi-agent Architecture: Allows running multiple isolated agents, each with its own configuration, workspace, target device, and session state.
  • Scheduled Jobs: Enables creation of cron tasks from natural language commands via chat or CLI.
  • Human-Auth Relay: Provides an explicit approval mechanism for sensitive actions like payments or location access.
  • Skills Framework: Extensible by adding SKILL.md files to define new agent capabilities without code changes.
  • Channel Integrations: Supports task input and result output via Telegram, Discord, WhatsApp, and CLI.

Maintenance & Community

The project actively encourages community contributions via detailed guidelines in CONTRIBUTING.md. Community interaction is facilitated through Discord and Reddit. Acknowledgements highlight dependencies on projects like the pi-mono ecosystem.

Licensing & Compatibility

This project is licensed under the MIT License, which generally permits commercial use and integration into closed-source projects.

Limitations & Caveats

The README does not explicitly detail limitations such as alpha status or known bugs. Sensitive actions require explicit human approval via the relay, which is a core design feature rather than a limitation, ensuring user oversight for critical operations.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
29
Issues (30d)
1
Star History
824 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.