gpt4v-browsing by unconv

Tool for answering questions using website screenshots and GPT-4 Vision API

created 1 year ago

563 stars

Top 58.0% on sourcepulse

Project Summary

This project provides a tool for answering questions based on website content by leveraging GPT-4 Vision API and Puppeteer. It's designed for users who need to extract information from web pages programmatically, especially when visual context or interactive elements are involved.

How It Works

The tool operates by first using Puppeteer to navigate to a specified URL and capture a screenshot of the webpage. This screenshot is then fed into the GPT-4 Vision API, which analyzes the visual information to answer user-provided questions. The JavaScript version extends this by enabling the tool to interact with web pages, such as clicking on links, to navigate and gather information more dynamically.

Quick Start & Requirements

Installation:
- JavaScript version: npm install
- Python version: npm install (for Puppeteer) and pip install -r requirements.txt
Execution:
- JavaScript version: node vision_crawl.js
- Python version: python3 vision_crawl.py
Prerequisites: Requires Node.js, npm, Python 3, and an OpenAI API key with access to GPT-4 Vision.

Highlighted Details

Automated web crawling and screenshotting for visual analysis.
GPT-4 Vision API integration for question answering based on visual data.
Puppeteer for headless browser automation, including link clicking in the JavaScript version.
Supports natural language queries about website content.

Maintenance & Community

The project appears to be a personal or small-team effort with no explicit mention of maintainers, community channels, or a roadmap.

Licensing & Compatibility

The README does not specify a license.

Limitations & Caveats

The tool's effectiveness is dependent on the GPT-4 Vision API's capabilities and potential rate limits. The Python version is limited to single URL processing without interactive navigation.

gpt4v-browsing by unconv

Explore Similar Projects

gpt4V-scraper by vdutts7

AutoNode by TransformerOptimus

ActGPT by ethanhe42

visualwebarena by web-arena-x

gpt-assistant by BuilderIO

browser-agent by m1guelpf

searchGPT by michaelthwan

WebVoyager by MinorJerry

vimGPT by ishan0102

tap4-ai-crawler by 6677-ai

Chrome-GPT by richardyc

clarity-ai by mckaywrigley