midscene  by web-infra-dev

AI operator for web, Android, automation & testing

created 1 year ago
9,768 stars

Top 5.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Midscene.js positions itself as an AI-powered operator for web and Android automation, enabling users to describe tasks in natural language for interface operation, content validation, and data extraction. It targets developers and testers seeking an intuitive approach to automation, offering a simplified debugging experience and flexible deployment options.

How It Works

Midscene.js leverages multimodal Large Language Models (LLMs) to interpret natural language commands and plan UI automation steps. It supports various models, including proprietary ones like GPT-4o and open-source options like UI-TARS and Qwen2.5-VL, which are specifically noted for UI automation performance. The system integrates with browser automation tools (Puppeteer, Playwright) via a Chrome extension or directly, and with Android devices using JavaScript SDK and ADB. A key advantage is its focus on debugging, offering visual reports and a playground for replaying and analyzing automation processes.

Quick Start & Requirements

  • Install/Run: Primarily through a Chrome extension for web automation or an Android playground for Android automation. Integration with Puppeteer/Playwright and ADB is also supported via JavaScript SDK.
  • Prerequisites: Chrome browser for extension, Android device for Android automation. Specific LLM models may have their own requirements.
  • Resources: Links to Home Page, Web Browser Automation Quick Experience, Android Automation Quick Experience, API Reference, and Model Choices are provided.

Highlighted Details

  • Supports natural language interaction for task planning and execution.
  • Offers visual reports and a playground for debugging automation processes.
  • Provides caching for improved execution efficiency on repeated tasks.
  • Allows specifying JSON format for data extraction and natural language assertions.

Maintenance & Community

The project is associated with web-infra-dev and credits contributors and projects like Rsbuild, UI-TARS, Qwen2.5-VL, scrcpy, appium-adb, Puppeteer, and Playwright. Community channels include Discord and an X (formerly Twitter) presence.

Licensing & Compatibility

Midscene.js is released under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The effectiveness and stability of the automation are dependent on the chosen LLM's capabilities and the clarity of the natural language instructions provided. Specific performance benchmarks or comparisons against traditional automation tools are not detailed in the README.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
64
Issues (30d)
39
Star History
1,283 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

TagUI by aisingapore

0.1%
6k
Free RPA tool for automating repetitive tasks on websites, desktop apps, and command lines
created 8 years ago
updated 5 months ago
Feedback? Help us improve.