This project provides an iPhone agent powered by OpenAI's GPT-4.1 model, designed to automate tasks across multiple applications, mimicking human interaction without requiring a jailbreak. It's targeted at developers and power users looking to explore AI-driven mobile automation.
How It Works
The agent leverages Xcode's UI testing harness to inspect and interact with iOS applications and the system. It accesses the app's accessibility tree to perform actions like tapping, swiping, scrolling, and typing. A host app communicates with the UI test via a TCP server to trigger prompts, enabling the agent to execute commands like sending messages, downloading apps, or controlling device features.
Quick Start & Requirements
testLoop
function within PhoneAgentUITests.swift
.Highlighted Details
Maintenance & Community
This appears to be a personal project developed during an OpenAI hackathon, with no explicit mention of ongoing maintenance or community channels.
Licensing & Compatibility
The license is not specified in the README. Compatibility for commercial use or closed-source linking is not detailed.
Limitations & Caveats
The agent may struggle with UI elements during animations, doesn't always wait for long-running tasks, and currently lacks image recognition of the screen. Keyboard input and premature task abandonment are noted areas for improvement. This is experimental software, and user data is sent to OpenAI's API.
1 month ago
Inactive