midscene  by web-infra-dev

AI operator for web, Android, automation & testing

Created 1 year ago
10,321 stars

Top 4.9% on SourcePulse

GitHubView on GitHub
Project Summary

Midscene.js positions itself as an AI-powered operator for web and Android automation, enabling users to describe tasks in natural language for interface operation, content validation, and data extraction. It targets developers and testers seeking an intuitive approach to automation, offering a simplified debugging experience and flexible deployment options.

How It Works

Midscene.js leverages multimodal Large Language Models (LLMs) to interpret natural language commands and plan UI automation steps. It supports various models, including proprietary ones like GPT-4o and open-source options like UI-TARS and Qwen2.5-VL, which are specifically noted for UI automation performance. The system integrates with browser automation tools (Puppeteer, Playwright) via a Chrome extension or directly, and with Android devices using JavaScript SDK and ADB. A key advantage is its focus on debugging, offering visual reports and a playground for replaying and analyzing automation processes.

Quick Start & Requirements

  • Install/Run: Primarily through a Chrome extension for web automation or an Android playground for Android automation. Integration with Puppeteer/Playwright and ADB is also supported via JavaScript SDK.
  • Prerequisites: Chrome browser for extension, Android device for Android automation. Specific LLM models may have their own requirements.
  • Resources: Links to Home Page, Web Browser Automation Quick Experience, Android Automation Quick Experience, API Reference, and Model Choices are provided.

Highlighted Details

  • Supports natural language interaction for task planning and execution.
  • Offers visual reports and a playground for debugging automation processes.
  • Provides caching for improved execution efficiency on repeated tasks.
  • Allows specifying JSON format for data extraction and natural language assertions.

Maintenance & Community

The project is associated with web-infra-dev and credits contributors and projects like Rsbuild, UI-TARS, Qwen2.5-VL, scrcpy, appium-adb, Puppeteer, and Playwright. Community channels include Discord and an X (formerly Twitter) presence.

Licensing & Compatibility

Midscene.js is released under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The effectiveness and stability of the automation are dependent on the chosen LLM's capabilities and the clarity of the natural language instructions provided. Specific performance benchmarks or comparisons against traditional automation tools are not detailed in the README.

Health Check
Last Commit

16 hours ago

Responsiveness

1 day

Pull Requests (30d)
112
Issues (30d)
51
Star History
328 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
15 more.

stagehand by browserbase

0.5%
17k
AI browser automation framework for production
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.