surf-cli  by nicobailon

CLI for AI agents to control Chrome

Created 1 month ago
261 stars

Top 97.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Surf-cli addresses the complexity of browser automation for AI agents by providing a zero-configuration, agent-agnostic command-line interface. It enables AI models and custom scripts to control Chrome browsers programmatically, simplifying tasks like web scraping, form filling, and interaction with dynamic web applications without vendor lock-in or intricate setup.

How It Works

The system comprises a CLI tool that communicates via a Unix socket to a native host process. This host interacts with a Chrome extension, which leverages the Chrome DevTools Protocol (CDP) and the chrome.scripting API for browser control. This architecture allows for agent-agnostic operation, as any agent capable of executing CLI commands can utilize Surf. Its design prioritizes zero configuration, battle-tested reliability through reverse-engineering production extensions, and graceful fallbacks when CDP operations fail.

Quick Start & Requirements

  • Primary Install: npm install -g surf-cli
  • Prerequisites: Node.js and npm are required. For Linux, additional dependencies include chromium-browser, nodejs, npm, and imagemagick.
  • Setup: After global installation, load the Surf extension into Chrome via chrome://extensions (enabling Developer mode) using the path provided by surf extension-path, followed by surf install to register the native host.
  • Links: Official quick-start and usage details are available within the README.

Highlighted Details

  • AI Integration without API Keys: Interact with services like ChatGPT, Gemini, Perplexity, and Grok using existing browser logins, eliminating the need for API keys.
  • Network Capture: Automatically logs all network requests during active sessions, allowing for filtering, searching, and replaying API calls.
  • Semantic Locators & Element Refs: Interact with web elements using stable, semantic identifiers derived from the accessibility tree (e.g., by role, text, or label), enhancing resilience against DOM changes.
  • Workflows: Define and execute multi-step browser automation sequences either inline via pipe-separated commands or from JSON files, featuring automatic waits for stability and argument passing.
  • Device Emulation: Supports emulating various mobile and tablet devices for responsive design testing.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license permits broad use, including commercial applications and linking with closed-source projects.

Limitations & Caveats

Automation is restricted for chrome:// internal pages and the Chrome Web Store due to browser security policies. The initial CDP operation on a new tab incurs a ~100-500ms delay for debugger attachment. Linux support is experimental and requires specific dependencies and potential headless server configurations (Xvfb, VNC).

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
11
Issues (30d)
2
Star History
236 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.