Web scraping agent with GPT-4V for browser automation
Top 92.1% on sourcepulse
This project provides an AI-powered web agent capable of visual understanding, navigation, and task execution within a browser. It's designed for users needing automated web scraping, data extraction, and interactive browsing experiences, leveraging GPT-4V's visual capabilities.
How It Works
The agent operates in three main stages. First, it uses Puppeteer with a stealth plugin to capture full-page screenshots of websites, designed to bypass anti-bot measures. Second, it processes these screenshots using a Python script that integrates with GPT-4V for OCR and context-aware data extraction based on user-defined prompts. Finally, it enables real-time, conversational interaction with the web agent, allowing users to guide it through Bing searches and complex web tasks.
Quick Start & Requirements
npm i
(for Node.js part) and pip install -r requirements.txt
(for Python part)..env.template
to .env
and add OPENAI_API_KEY
.executablePath
and userDataDir
in snapshot.js
for your Chrome/Chrome Canary installation.node snapshot.js "<URL>"
python gpt4v_scraper.py
node web_agent.js
Highlighted Details
Maintenance & Community
The project is maintained by vdutts7. No specific community channels or roadmap details are provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. It mentions "FREE 200 USD cloud credits" via a DigitalOcean banner, but this is promotional and not a software license. Compatibility for commercial use is not specified.
Limitations & Caveats
The project appears to be in an early stage, with the README suggesting manual configuration of browser paths and environment variables. The effectiveness of the "stealth plugin" against sophisticated anti-bot measures is not benchmarked. The project also includes commentary on website paywalls, which may be considered unprofessional by some users.
1 year ago
Inactive