page-agent  by alibaba

GUI agent for controlling web interfaces with natural language

Created 3 months ago
575 stars

Top 56.2% on SourcePulse

GitHubView on GitHub
Project Summary

PageAgent is a JavaScript library that transforms webpages into interactive agents controllable via natural language. It targets web developers and users seeking intuitive, client-side automation of web interfaces, enabling task execution through simple text commands without requiring server-side infrastructure.

How It Works

PageAgent operates primarily on the client-side, extracting relevant DOM information and processing natural language instructions locally. This design enhances user privacy and reduces server load. Its architecture comprises distinct packages for core agent logic, DOM manipulation, and a user interface layer featuring masks and animations for human-in-the-loop interaction.

Quick Start & Requirements

  • Installation:
    • CDN: <script src="https://cdn.jsdelivr.net/npm/page-agent@latest/dist/umd/page-agent.js"></script>
    • NPM: npm install page-agent
  • Usage: Import PageAgent and instantiate with configuration. Example: const agent = new PageAgent({ modelName: '...', baseURL: '...', apiKey: '...' }); await agent.execute('Click the login button');
  • Prerequisites: A web browser environment. Node.js is required for NPM installation and development. The provided example uses a demo LLM endpoint (PAGE-AGENT-FREE-TESTING-RANDOM) with noted rate, prompt, and origin limits; users are advised to integrate their own LLM.
  • Links:

Highlighted Details

  • Easy Integration: Single script tag integration for immediate webpage enhancement.
  • Client-Side Processing: Enhances privacy and performance by processing logic within the user's browser.
  • DOM Extraction: Intelligent extraction of page structure for agent understanding.
  • Natural Language Interface: Enables control via conversational text commands.
  • UI with Human in the Loop: Interactive elements guide user confirmation and interaction.

Maintenance & Community

The project provides contributing guidelines and a Code of Conduct. Specific community channels (e.g., Discord, Slack) or notable contributors/sponsorships are not detailed in the README.

Licensing & Compatibility

Licensed under the MIT License. The project explicitly states it is designed for client-side web enhancement and not server-side automation. Compatibility for commercial use is generally permissive under MIT terms, but specific third-party dependencies should be reviewed.

Limitations & Caveats

The project is strictly for client-side applications; it does not support server-side automation. The free testing LLM endpoint has limitations on usage (rate, prompt, origin) and users are strongly encouraged to provide their own LLM configurations for reliable operation.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
16
Issues (30d)
16
Star History
330 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.