computer-use-preview  by google

Automate web browsing with natural language commands

Created 5 months ago
1,318 stars

Top 30.3% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a browser automation agent capable of executing natural language instructions. It targets developers and researchers needing to automate web interactions programmatically, offering a way to translate high-level commands into browser actions via large language models. The primary benefit is enabling complex web task automation through simple, human-readable queries.

How It Works

The agent interprets natural language queries using either Google's Gemini Developer API or Vertex AI. It then leverages browser automation libraries, specifically Playwright for local execution or Browserbase for cloud-based control, to interact with web pages. This approach allows for dynamic, intent-driven web navigation and task completion without manual scripting for each step.

Quick Start & Requirements

  • Installation: Clone the repository, set up a Python virtual environment (python3 -m venv .venv, source .venv/bin/activate), install dependencies (pip install -r requirements.txt), and install Playwright's browser and system dependencies (playwright install-deps chrome, playwright install chrome).
  • Prerequisites: Python 3.x, requirements.txt dependencies, a Google Gemini API key OR Vertex AI project ID and location, and a Chrome browser.
  • Configuration: Set environment variables for API keys (GEMINI_API_KEY or USE_VERTEXAI, VERTEXAI_PROJECT, VERTEXAI_LOCATION) and optionally Browserbase credentials (BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID) if using that environment.
  • Running: Execute python main.py --query "Your natural language command" with optional --env (playwright or browserbase) and --initial_url flags.
  • Links: GitHub Repository

Highlighted Details

  • Supports two distinct execution environments: local Playwright control and cloud-based Browserbase integration.
  • Offers a highlight_mouse option for visual debugging during Playwright execution.
  • Provides a flexible CLI with arguments for query, environment, and initial URL.

Maintenance & Community

Information regarding maintainers, community channels (like Discord or Slack), sponsorships, or a public roadmap is not detailed in the provided README.

Licensing & Compatibility

The README does not specify a software license. Therefore, licensing terms, restrictions, and compatibility for commercial or closed-source use are undetermined.

Limitations & Caveats

As a "preview" release, the project may be experimental or subject to significant changes. Successful operation is contingent on obtaining and configuring necessary API keys for Gemini/Vertex AI and potentially Browserbase, representing a key adoption hurdle. The lack of explicit licensing information poses a risk for integration into production systems.

Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
35
Issues (30d)
11
Star History
1,329 stars in the last 30 days

Explore Similar Projects

Starred by Travis Fischer Travis Fischer(Founder of Agentic), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
4 more.

open-operator by browserbase

0.4%
2k
Template for building web agents using Browserbase and Stagehand
Created 8 months ago
Updated 4 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
15 more.

stagehand by browserbase

5.3%
18k
AI browser automation framework for production
Created 1 year ago
Updated 15 hours ago
Feedback? Help us improve.