ApkClaw  by apkclaw-team

Agentic control for Android devices

Created 2 weeks ago

New!

306 stars

Top 87.7% on SourcePulse

GitHubView on GitHub
Project Summary

An AI-powered Android automation app, ApkClaw allows LLM agents to control devices via natural language instructions sent through messaging channels like Discord and Telegram. It targets users and developers seeking to automate device operations, offering a novel approach to human-device interaction.

How It Works

ApkClaw employs an agent loop that follows an Observe → Think → Act → Verify protocol. It constructs a system prompt with device context and available tools, then iteratively calls an LLM (supporting OpenAI and Anthropic via LangChain4j) to determine actions. Tool execution is managed by a ClawAccessibilityService for UI interaction and a ToolRegistry for mapping abstract tools to device-specific operations. The architecture supports pluggable LLM backends and uses a custom OkHttp adapter for Android compatibility, with mechanisms for loop detection and token optimization.

Quick Start & Requirements

Build the project using ./gradlew assembleDebug or assembleRelease. Install the generated APK on an Android device (Android 9+). Essential permissions include Accessibility Service, Notification, System Window, Battery Whitelist, and File Access. Configure LLM API keys and model names, along with messaging channel bot credentials, via the app's Settings or a LAN configuration server (http://<device-ip>:9527). Requires Java 17+ and Android Studio.

Highlighted Details

  • Supports pluggable LLM providers (OpenAI, Anthropic) with streaming capabilities.
  • Leverages LangChain4j for agent orchestration and tool definition bridging.
  • ClawAccessibilityService provides core device interaction: gestures, UI hierarchy traversal, and key injection.
  • Includes loop detection and token optimization mechanisms for efficient agent operation.
  • Handles protected system dialogs by taking screenshots and notifying the user.
  • Integrates with multiple messaging channels: DingTalk, Feishu, QQ, Discord, Telegram.

Maintenance & Community

No specific details on maintenance, community channels, or notable contributors were found in the provided README.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0, which is permissive for commercial use and distribution.

Limitations & Caveats

Protected system windows (e.g., permission dialogs) can block UI interaction, leading the agent to screenshot and abort the task for manual intervention. The system operates on a single-task model due to task locking. Screenshot functionality requires Android 11+.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
321 stars in the last 14 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Gregor Zunic Gregor Zunic(Cofounder of Browser Use).

droidrun by droidrun

0.5%
8k
Framework for controlling Android devices via LLM agents
Created 11 months ago
Updated 17 hours ago
Feedback? Help us improve.