Discover and explore top open-source AI tools and projects—updated daily.
tencentmusicLightweight UI automation agent driven by natural language
Top 51.8% on SourcePulse
This project provides a lightweight UI automation agent, PageEyes Agent, designed to execute Web and Android UI tasks through natural language commands, eliminating the need for manual scripting. It targets engineers and testers seeking efficient, script-free automation for tasks like testing and UI inspection across different platforms. The primary benefit is enabling complex UI interactions via simple text instructions, streamlining the automation workflow.
How It Works
PageEyes Agent is built on the Pydantic AI framework, leveraging the OmniParserV2 model for element information perception. A key design advantage is its independence from large visual language models, allowing it to function effectively with smaller LLMs for path planning. This approach enhances flexibility and reduces computational overhead. The agent supports cross-platform execution on Web and Android, with future plans for iOS. It integrates with various LLMs, defaulting to DeepSeek V3, and enables natural language assertions with detailed execution logging and reporting.
Quick Start & Requirements
pip install page-eyes. Alternatively, clone the repository and run uv sync to install dependencies.AGENT_MODEL), API keys (OPENAI_API_KEY), API endpoint (OPENAI_BASE_URL), and optional settings like debug mode (AGENT_DEBUG) or headless operation (AGENT_HEADLESS). Cloud storage (Tencent Cloud COS or MinIO) credentials (COS_SECRET_ID, MINIO_ENDPOINT, etc.) are necessary for logging and reporting.Highlighted Details
Maintenance & Community
The project outlines a contribution guide and encourages users to submit issues for feature ideas or bug reports. A community group is mentioned for further discussion. Specific details regarding maintainers, sponsorships, or a public roadmap beyond planned platform support are not detailed in the provided text.
Licensing & Compatibility
The provided documentation does not explicitly state the software license. This omission requires clarification regarding usage rights, particularly for commercial applications or integration into closed-source projects.
Limitations & Caveats
Currently, the agent supports Web and Android platforms, with iOS support pending. It necessitates configuration of external services like OmniParser and cloud storage, along with valid API keys for the chosen LLM, which may incur costs. The absence of a specified license is a significant adoption blocker.
2 weeks ago
Inactive
droidrun
bytedance