on-device-browser-agent by RunanywhereAI

On-device AI browser automation

Created 5 months ago

297 stars

Top 89.2% on SourcePulse

Project Summary

This project offers on-device AI-powered web automation, enabling users to perform complex tasks within their browser without relying on cloud services or API keys. It targets developers and power users seeking a private, offline solution for automating web interactions, such as data extraction and form filling, directly within their local environment.

How It Works

The system employs a multi-agent architecture, featuring a Planner Agent for strategic task decomposition and a Navigator Agent for tactical execution. It leverages WebLLM with WebGPU acceleration for local inference of large language models. The Navigator agent analyzes the current web page's DOM to determine and execute actions like clicking, typing, and data extraction. This process iterates until the user's task is completed or fails, ensuring all AI processing remains on the user's device.

Quick Start & Requirements

Installation: Clone the repository, navigate to the local-browser directory, and run npm install.
Build: Execute npm run build.
Load: Open chrome://extensions, enable "Developer mode," and click "Load unpacked," selecting the dist folder.
Prerequisites: Chrome 124+ (for WebGPU in service workers), Node.js 18+, npm, and a GPU with WebGPU support.
First Run: The initial launch requires downloading an AI model (approx. 1GB), which is cached for subsequent use.

Highlighted Details

On-Device AI: Utilizes WebLLM with WebGPU for local LLM inference, ensuring privacy.
Multi-Agent System: Employs Planner and Navigator agents for intelligent, step-by-step task execution.
Privacy-First: All AI computations are performed locally, preventing data exfiltration.
Offline Support: Fully functional offline after the initial model download.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were provided in the README.

Licensing & Compatibility

The project is released under the MIT License, which generally permits commercial use and integration into closed-source projects.

Limitations & Caveats

This project is presented as a Proof-of-Concept (POC) and is not intended for production use. It lacks vision capabilities, relying solely on text-based DOM analysis. Functionality is limited to the currently active tab, and it supports only basic actions (navigate, click, type, extract, scroll, wait). Smaller LLM models may exhibit limitations with highly complex tasks. Certain browser pages (e.g., chrome://) may prevent content script execution.

Health Check

Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days