gpt4V-scraper  by vdutts7

Web scraping agent with GPT-4V for browser automation

created 1 year ago
288 stars

Top 92.1% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an AI-powered web agent capable of visual understanding, navigation, and task execution within a browser. It's designed for users needing automated web scraping, data extraction, and interactive browsing experiences, leveraging GPT-4V's visual capabilities.

How It Works

The agent operates in three main stages. First, it uses Puppeteer with a stealth plugin to capture full-page screenshots of websites, designed to bypass anti-bot measures. Second, it processes these screenshots using a Python script that integrates with GPT-4V for OCR and context-aware data extraction based on user-defined prompts. Finally, it enables real-time, conversational interaction with the web agent, allowing users to guide it through Bing searches and complex web tasks.

Quick Start & Requirements

  • Install dependencies: npm i (for Node.js part) and pip install -r requirements.txt (for Python part).
  • Set up environment variables: Copy .env.template to .env and add OPENAI_API_KEY.
  • Configure browser path: Update executablePath and userDataDir in snapshot.js for your Chrome/Chrome Canary installation.
  • Run screenshot/scrape: node snapshot.js "<URL>"
  • Run Python extraction: python gpt4v_scraper.py
  • Run web agent: node web_agent.js
  • Requires Node.js, Python 3, and an OpenAI API key.

Highlighted Details

  • Utilizes Puppeteer with a stealth plugin for robust web scraping.
  • Integrates GPT-4V for visual understanding and text extraction from screenshots.
  • Supports real-time conversational control of the web agent for guided browsing and search.
  • Allows customization of browser paths and user data directories for session management.

Maintenance & Community

The project is maintained by vdutts7. No specific community channels or roadmap details are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. It mentions "FREE 200 USD cloud credits" via a DigitalOcean banner, but this is promotional and not a software license. Compatibility for commercial use is not specified.

Limitations & Caveats

The project appears to be in an early stage, with the README suggesting manual configuration of browser paths and environment variables. The effectiveness of the "stealth plugin" against sophisticated anti-bot measures is not benchmarked. The project also includes commentary on website paywalls, which may be considered unprofessional by some users.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.