Web browsing agent controlled by natural language
Top 54.2% on sourcepulse
BrowserPilot enables natural language control of web browsers for automation and scraping tasks, targeting developers and researchers who want to avoid writing brittle code. It translates plain English instructions into executable Selenium commands via an LLM, simplifying complex web interactions.
How It Works
The core approach leverages GPT-3 (or GPT-3.5-turbo) to interpret natural language instructions and generate Python Selenium code. It exposes a set of predefined environment actions (e.g., env.click
, env.send_keys
, env.find_element
) to the LLM, which then constructs sequences of these actions. This allows for dynamic, intent-driven automation rather than rigid, pre-scripted workflows. The system can also manage conversational memory and retrieve information from browsed pages.
Quick Start & Requirements
pip install browserpilot
.chromedriver
in the same directory as the script.OPENAI_API_KEY
environment variable.Highlighted Details
BEGIN_FUNCTION
/END_FUNCTION
blocks.Maintenance & Community
The project has seen recent activity with updates for Selenium Grid support, Llama Index integration, and model switching to gpt-3.5-turbo
for cost reduction. Contributions are welcomed for prompt library expansion and core agent capabilities.
Licensing & Compatibility
The software is provided "AS IS" without warranty. The license is not explicitly stated in the README, but the disclaimer suggests a permissive, non-restrictive license typical of open-source projects.
Limitations & Caveats
This package executes LLM-generated Python code using exec
, which is explicitly noted as an unsafe convention. Users must be cautious due to potential security risks. The effectiveness of prompt translation can vary, requiring precise language similar to using a code assistant like Copilot.
7 months ago
1 day