Tool for answering questions using website screenshots and GPT-4 Vision API
Top 58.0% on sourcepulse
This project provides a tool for answering questions based on website content by leveraging GPT-4 Vision API and Puppeteer. It's designed for users who need to extract information from web pages programmatically, especially when visual context or interactive elements are involved.
How It Works
The tool operates by first using Puppeteer to navigate to a specified URL and capture a screenshot of the webpage. This screenshot is then fed into the GPT-4 Vision API, which analyzes the visual information to answer user-provided questions. The JavaScript version extends this by enabling the tool to interact with web pages, such as clicking on links, to navigate and gather information more dynamically.
Quick Start & Requirements
npm install
npm install
(for Puppeteer) and pip install -r requirements.txt
node vision_crawl.js
python3 vision_crawl.py
Highlighted Details
Maintenance & Community
The project appears to be a personal or small-team effort with no explicit mention of maintainers, community channels, or a roadmap.
Licensing & Compatibility
The README does not specify a license.
Limitations & Caveats
The tool's effectiveness is dependent on the GPT-4 Vision API's capabilities and potential rate limits. The Python version is limited to single URL processing without interactive navigation.
1 year ago
Inactive