gemma-gem by kessler

Browser-based AI agent for on-device web interaction

Created 1 month ago

921 stars

Top 39.2% on SourcePulse

Project Summary

Gemma Gem provides an on-device AI assistant directly within the browser, leveraging Google's Gemma 4 model through WebGPU. It empowers users to interact with web pages by reading content, performing actions like clicking and form filling, and executing JavaScript, all while ensuring data privacy as no information leaves the user's machine and no API keys are needed. This offers a powerful, private, and offline-capable solution for web automation and information retrieval.

How It Works

The architecture employs a Service Worker for message routing and handling specific actions like screenshot capture and JavaScript execution. An Offscreen Document hosts the Gemma 4 model, utilizing @huggingface/transformers and WebGPU for efficient, on-device inference. Content Scripts inject the user interface and manage DOM manipulation tools, enabling interaction with web page elements. This separation allows for robust, hardware-accelerated AI processing locally, with an agent loop orchestrating tool usage for complex tasks.

Quick Start & Requirements

Primary install: Run pnpm install followed by pnpm build.
Prerequisites: Chrome browser with WebGPU support.
Disk Space: Approximately 500MB for the E2B model, 1.5GB for E4B (cached after first use).
Setup: Load the extension from the .output/chrome-mv3-dev/ directory in chrome://extensions using developer mode.

Highlighted Details

On-device execution of Google's Gemma 4 (E2B/E4B) models using WebGPU, ensuring data privacy and offline capability.
Full browser automation features: reading page content, clicking elements, typing text, and executing JavaScript directly within the page context.
No external API keys or cloud dependencies required, minimizing costs and security risks.
Supports large context windows (128K tokens) and offers model selection (E2B ~500MB, E4B ~1.5GB).
A comprehensive set of tools for interacting with web page elements and structure.

Maintenance & Community

The provided README does not detail specific contributors, sponsorships, community channels (e.g., Discord, Slack), or a public roadmap. The focus is on the technical implementation and architecture.

Licensing & Compatibility

The README does not explicitly state a software license. The project utilizes the WXT framework and @huggingface/transformers, which have their own licensing terms. Compatibility is primarily targeted at Chrome browsers supporting WebGPU.

Limitations & Caveats

Primarily designed as a Chrome extension requiring developer mode for installation, suggesting an ongoing development status. Explicit licensing details are absent, potentially impacting commercial use or integration. Performance characteristics on diverse hardware and potential conflicts with complex web applications are not detailed.

gemma-gem by kessler

Explore Similar Projects

on-device-browser-agent by RunanywhereAI

skills by browser-act

desktop by browser-use

awesome-autonomous-web by Agent-Tools

agentboard by gbasin

surf-cli by nicobailon

opentabs by opentabs-dev

molmoweb by allenai

open-chatgpt-atlas by ComposioHQ

browserable by browserable

actionbook by actionbook

skills by browserbase