Python library for AI-powered web scraping
Top 2.1% on sourcepulse
ScrapeGraphAI is a Python library designed for web scraping and data extraction from various sources, including websites and local files (HTML, XML, JSON, Markdown). It leverages Large Language Models (LLMs) and graph logic to automate the creation of scraping pipelines, allowing users to specify desired information through natural language prompts. This simplifies complex scraping tasks for developers and data analysts.
How It Works
The library employs a graph-based approach where different "graphs" represent distinct scraping pipelines. The core SmartScraperGraph
takes a user prompt and a source URL to extract information, abstracting away the complexities of parsing and LLM interaction. It supports various LLM providers (OpenAI, Ollama, Groq, Azure, Gemini) and offers parallel processing capabilities for multi-page scraping.
Quick Start & Requirements
pip install scrapegraphai
playwright install
Highlighted Details
SmartScraperGraph
, SearchGraph
, SpeechGraph
, ScriptCreatorGraph
, and their multi-page variants.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The library is intended for data exploration and research purposes; users are responsible for ethical usage. Telemetry is collected by default, though it can be disabled.
1 month ago
1 day