gemma-gem  by kessler

Browser-based AI agent for on-device web interaction

Created 6 days ago

New!

629 stars

Top 52.5% on SourcePulse

GitHubView on GitHub
Project Summary

Gemma Gem provides an on-device AI assistant directly within the browser, leveraging Google's Gemma 4 model through WebGPU. It empowers users to interact with web pages by reading content, performing actions like clicking and form filling, and executing JavaScript, all while ensuring data privacy as no information leaves the user's machine and no API keys are needed. This offers a powerful, private, and offline-capable solution for web automation and information retrieval.

How It Works

The architecture employs a Service Worker for message routing and handling specific actions like screenshot capture and JavaScript execution. An Offscreen Document hosts the Gemma 4 model, utilizing @huggingface/transformers and WebGPU for efficient, on-device inference. Content Scripts inject the user interface and manage DOM manipulation tools, enabling interaction with web page elements. This separation allows for robust, hardware-accelerated AI processing locally, with an agent loop orchestrating tool usage for complex tasks.

Quick Start & Requirements

  • Primary install: Run pnpm install followed by pnpm build.
  • Prerequisites: Chrome browser with WebGPU support.
  • Disk Space: Approximately 500MB for the E2B model, 1.5GB for E4B (cached after first use).
  • Setup: Load the extension from the .output/chrome-mv3-dev/ directory in chrome://extensions using developer mode.

Highlighted Details

  • On-device execution of Google's Gemma 4 (E2B/E4B) models using WebGPU, ensuring data privacy and offline capability.
  • Full browser automation features: reading page content, clicking elements, typing text, and executing JavaScript directly within the page context.
  • No external API keys or cloud dependencies required, minimizing costs and security risks.
  • Supports large context windows (128K tokens) and offers model selection (E2B ~500MB, E4B ~1.5GB).
  • A comprehensive set of tools for interacting with web page elements and structure.

Maintenance & Community

The provided README does not detail specific contributors, sponsorships, community channels (e.g., Discord, Slack), or a public roadmap. The focus is on the technical implementation and architecture.

Licensing & Compatibility

The README does not explicitly state a software license. The project utilizes the WXT framework and @huggingface/transformers, which have their own licensing terms. Compatibility is primarily targeted at Chrome browsers supporting WebGPU.

Limitations & Caveats

Primarily designed as a Chrome extension requiring developer mode for installation, suggesting an ongoing development status. Explicit licensing details are absent, potentially impacting commercial use or integration. Performance characteristics on diverse hardware and potential conflicts with complex web applications are not detailed.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
3
Star History
632 stars in the last 6 days

Explore Similar Projects

Feedback? Help us improve.