gpt4V-scraper by vdutts7

Web scraping agent with GPT-4V for browser automation

Created 2 years ago

296 stars

Top 89.6% on SourcePulse

Project Summary

This project provides an AI-powered web agent capable of visual understanding, navigation, and task execution within a browser. It's designed for users needing automated web scraping, data extraction, and interactive browsing experiences, leveraging GPT-4V's visual capabilities.

How It Works

The agent operates in three main stages. First, it uses Puppeteer with a stealth plugin to capture full-page screenshots of websites, designed to bypass anti-bot measures. Second, it processes these screenshots using a Python script that integrates with GPT-4V for OCR and context-aware data extraction based on user-defined prompts. Finally, it enables real-time, conversational interaction with the web agent, allowing users to guide it through Bing searches and complex web tasks.

Quick Start & Requirements

Install dependencies: npm i (for Node.js part) and pip install -r requirements.txt (for Python part).
Set up environment variables: Copy .env.template to .env and add OPENAI_API_KEY.
Configure browser path: Update executablePath and userDataDir in snapshot.js for your Chrome/Chrome Canary installation.
Run screenshot/scrape: node snapshot.js "<URL>"
Run Python extraction: python gpt4v_scraper.py
Run web agent: node web_agent.js
Requires Node.js, Python 3, and an OpenAI API key.

Highlighted Details

Utilizes Puppeteer with a stealth plugin for robust web scraping.
Integrates GPT-4V for visual understanding and text extraction from screenshots.
Supports real-time conversational control of the web agent for guided browsing and search.
Allows customization of browser paths and user data directories for session management.

Maintenance & Community

The project is maintained by vdutts7. No specific community channels or roadmap details are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. It mentions "FREE 200 USD cloud credits" via a DigitalOcean banner, but this is promotional and not a software license. Compatibility for commercial use is not specified.

Limitations & Caveats

The project appears to be in an early stage, with the README suggesting manual configuration of browser paths and environment variables. The effectiveness of the "stealth plugin" against sophisticated anti-bot measures is not benchmarked. The project also includes commentary on website paywalls, which may be considered unprofessional by some users.

gpt4V-scraper by vdutts7

Explore Similar Projects

oxylabs-ai-studio-py by oxylabs

dendrite-python-sdk by dendrite-systems

agent-browse by browserbase

ActGPT by ethanhe42

BrowserGPT by mayt

browser-agent by m1guelpf

sentient by sentient-engineering

browserpilot by handrew

TheAgenticBrowser by TheAgenticAI

natbot by nat

crawlee-python by apify

steel-browser by steel-dev