browserbee by parsaghaffari

AI browser assistant for natural language web control

Created 10 months ago

959 stars

Top 38.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Max Liu

Cofounder of PingCAP

Project Summary

BrowserBee is an open-source Chrome extension that acts as a privacy-first AI-powered browser assistant, enabling users to control web browsing with natural language commands. It targets users who want a personal AI assistant for tasks like social media management, news curation, research, and knowledge summarization, offering convenience and security by running primarily within the browser.

How It Works

BrowserBee leverages a combination of a Large Language Model (LLM) for understanding and planning user instructions and Playwright for robust browser automation. This approach allows it to interact with web pages, execute complex sequences of actions, and maintain privacy by processing data locally. The integration of Playwright within a browser extension is highlighted as a novel way to simplify browser automation for end-users compared to traditional backend service-browser architectures.

Quick Start & Requirements

Installation: Download latest release, unzip, and load unpacked extension in Chrome (chrome://extensions/ -> Developer mode -> Load unpacked). Alternatively, build from source (npm install or pnpm install, then npm run build or pnpm build) and load the dist directory, or install from the Chrome Web Store.
Prerequisites: LLM API keys for supported providers (Anthropic, OpenAI, Gemini) or Ollama configuration.
Usage: Open the side panel (toolbar icon or Alt+Shift+B), enter a natural language command, and press Enter.
Notes: Requires an open base tab for CDP attachment; cannot attach to chrome:// or chrome-extension:// URLs.
Documentation: ROADMAP.md

Highlighted Details

Supports major LLM providers (OpenAI, Anthropic, Gemini, Ollama) and includes token usage tracking.
Features a comprehensive set of browser interaction tools, including navigation, tab management, element interaction, DOM querying, and screenshotting.
Includes a "memory" feature to save and reuse efficient tool sequences, potentially reducing token costs.
Agents can request user approval for sensitive actions like purchases or social media posts.

Maintenance & Community

The project is actively developed by parsaghaffari.
Contribution guidelines are available in CONTRIBUTING.md.

Licensing & Compatibility

License: Apache 2.0.
Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

Interacting with web pages remains a challenging task for LLM agents due to the low information density of DOMs and screenshots, requiring simplified representations and efficient models for optimal performance.

Health Check

Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days