midscene  by web-infra-dev

AI operator for web, Android, automation & testing

Created 1 year ago
11,191 stars

Top 4.5% on SourcePulse

GitHubView on GitHub
Project Summary

Midscene.js positions itself as an AI-powered operator for web and Android automation, enabling users to describe tasks in natural language for interface operation, content validation, and data extraction. It targets developers and testers seeking an intuitive approach to automation, offering a simplified debugging experience and flexible deployment options.

How It Works

Midscene.js leverages multimodal Large Language Models (LLMs) to interpret natural language commands and plan UI automation steps. It supports various models, including proprietary ones like GPT-4o and open-source options like UI-TARS and Qwen2.5-VL, which are specifically noted for UI automation performance. The system integrates with browser automation tools (Puppeteer, Playwright) via a Chrome extension or directly, and with Android devices using JavaScript SDK and ADB. A key advantage is its focus on debugging, offering visual reports and a playground for replaying and analyzing automation processes.

Quick Start & Requirements

  • Install/Run: Primarily through a Chrome extension for web automation or an Android playground for Android automation. Integration with Puppeteer/Playwright and ADB is also supported via JavaScript SDK.
  • Prerequisites: Chrome browser for extension, Android device for Android automation. Specific LLM models may have their own requirements.
  • Resources: Links to Home Page, Web Browser Automation Quick Experience, Android Automation Quick Experience, API Reference, and Model Choices are provided.

Highlighted Details

  • Supports natural language interaction for task planning and execution.
  • Offers visual reports and a playground for debugging automation processes.
  • Provides caching for improved execution efficiency on repeated tasks.
  • Allows specifying JSON format for data extraction and natural language assertions.

Maintenance & Community

The project is associated with web-infra-dev and credits contributors and projects like Rsbuild, UI-TARS, Qwen2.5-VL, scrcpy, appium-adb, Puppeteer, and Playwright. Community channels include Discord and an X (formerly Twitter) presence.

Licensing & Compatibility

Midscene.js is released under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The effectiveness and stability of the automation are dependent on the chosen LLM's capabilities and the clarity of the natural language instructions provided. Specific performance benchmarks or comparisons against traditional automation tools are not detailed in the README.

Health Check
Last Commit

16 hours ago

Responsiveness

1 day

Pull Requests (30d)
123
Issues (30d)
49
Star History
373 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Gregor Zunic Gregor Zunic(Cofounder of Browser Use).

droidrun by droidrun

0.8%
7k
Framework for controlling Android devices via LLM agents
Created 9 months ago
Updated 1 day ago
Feedback? Help us improve.