browserpilot by handrew

Web browsing agent controlled by natural language

Created 2 years ago

629 stars

Top 52.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jerry Liu

Cofounder of LlamaIndex

Project Summary

BrowserPilot enables natural language control of web browsers for automation and scraping tasks, targeting developers and researchers who want to avoid writing brittle code. It translates plain English instructions into executable Selenium commands via an LLM, simplifying complex web interactions.

How It Works

The core approach leverages GPT-3 (or GPT-3.5-turbo) to interpret natural language instructions and generate Python Selenium code. It exposes a set of predefined environment actions (e.g., env.click, env.send_keys, env.find_element) to the LLM, which then constructs sequences of these actions. This allows for dynamic, intent-driven automation rather than rigid, pre-scripted workflows. The system can also manage conversational memory and retrieve information from browsed pages.

Quick Start & Requirements

Install via pip install browserpilot.
Requires downloading and placing chromedriver in the same directory as the script.
Requires setting the OPENAI_API_KEY environment variable.
Official documentation and examples are available within the repository.

Highlighted Details

Supports defining reusable functions within prompts using BEGIN_FUNCTION/END_FUNCTION blocks.
Can load instructions from YAML or JSON files for batch processing.
Includes experimental memory capabilities for synthesizing information from browsed pages.
Offers options to mitigate bot detection and improve compatibility with website anti-bot measures.

Maintenance & Community

The project has seen recent activity with updates for Selenium Grid support, Llama Index integration, and model switching to gpt-3.5-turbo for cost reduction. Contributions are welcomed for prompt library expansion and core agent capabilities.

Licensing & Compatibility

The software is provided "AS IS" without warranty. The license is not explicitly stated in the README, but the disclaimer suggests a permissive, non-restrictive license typical of open-source projects.

Limitations & Caveats

This package executes LLM-generated Python code using exec, which is explicitly noted as an unsafe convention. Users must be cautious due to potential security risks. The effectiveness of prompt translation can vary, requiring precise language similar to using a code assistant like Copilot.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days