surf-cli  by nicobailon

CLI for AI agents to control Chrome

Created 3 months ago
424 stars

Top 69.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Surf-cli addresses the complexity of browser automation for AI agents by providing a zero-configuration, agent-agnostic command-line interface. It enables AI models and custom scripts to control Chrome browsers programmatically, simplifying tasks like web scraping, form filling, and interaction with dynamic web applications without vendor lock-in or intricate setup.

How It Works

The system comprises a CLI tool that communicates via a Unix socket to a native host process. This host interacts with a Chrome extension, which leverages the Chrome DevTools Protocol (CDP) and the chrome.scripting API for browser control. This architecture allows for agent-agnostic operation, as any agent capable of executing CLI commands can utilize Surf. Its design prioritizes zero configuration, battle-tested reliability through reverse-engineering production extensions, and graceful fallbacks when CDP operations fail.

Quick Start & Requirements

  • Primary Install: npm install -g surf-cli
  • Prerequisites: Node.js and npm are required. For Linux, additional dependencies include chromium-browser, nodejs, npm, and imagemagick.
  • Setup: After global installation, load the Surf extension into Chrome via chrome://extensions (enabling Developer mode) using the path provided by surf extension-path, followed by surf install to register the native host.
  • Links: Official quick-start and usage details are available within the README.

Highlighted Details

  • AI Integration without API Keys: Interact with services like ChatGPT, Gemini, Perplexity, and Grok using existing browser logins, eliminating the need for API keys.
  • Network Capture: Automatically logs all network requests during active sessions, allowing for filtering, searching, and replaying API calls.
  • Semantic Locators & Element Refs: Interact with web elements using stable, semantic identifiers derived from the accessibility tree (e.g., by role, text, or label), enhancing resilience against DOM changes.
  • Workflows: Define and execute multi-step browser automation sequences either inline via pipe-separated commands or from JSON files, featuring automatic waits for stability and argument passing.
  • Device Emulation: Supports emulating various mobile and tablet devices for responsive design testing.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license permits broad use, including commercial applications and linking with closed-source projects.

Limitations & Caveats

Automation is restricted for chrome:// internal pages and the Chrome Web Store due to browser security policies. The initial CDP operation on a new tab incurs a ~100-500ms delay for debugger attachment. Linux support is experimental and requires specific dependencies and potential headless server configurations (Xvfb, VNC).

Health Check
Last Commit

21 hours ago

Responsiveness

Inactive

Pull Requests (30d)
31
Issues (30d)
4
Star History
86 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.