Scrapegraph-ai  by ScrapeGraphAI

Python library for AI-powered web scraping

created 1 year ago
20,814 stars

Top 2.1% on sourcepulse

GitHubView on GitHub
Project Summary

ScrapeGraphAI is a Python library designed for web scraping and data extraction from various sources, including websites and local files (HTML, XML, JSON, Markdown). It leverages Large Language Models (LLMs) and graph logic to automate the creation of scraping pipelines, allowing users to specify desired information through natural language prompts. This simplifies complex scraping tasks for developers and data analysts.

How It Works

The library employs a graph-based approach where different "graphs" represent distinct scraping pipelines. The core SmartScraperGraph takes a user prompt and a source URL to extract information, abstracting away the complexities of parsing and LLM interaction. It supports various LLM providers (OpenAI, Ollama, Groq, Azure, Gemini) and offers parallel processing capabilities for multi-page scraping.

Quick Start & Requirements

Highlighted Details

  • Multiple scraping pipelines available: SmartScraperGraph, SearchGraph, SpeechGraph, ScriptCreatorGraph, and their multi-page variants.
  • Supports parallel LLM calls for faster multi-page scraping.
  • Offers SDKs for Python and Node.js, and a dedicated API for integration.
  • Collects anonymous usage metrics, with an opt-out option.

Maintenance & Community

  • Active development with contributions from multiple authors.
  • Community channels include Discord and social media (LinkedIn, Twitter).
  • Contributing guidelines are available.

Licensing & Compatibility

  • Licensed under the MIT License.
  • Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The library is intended for data exploration and research purposes; users are responsible for ethical usage. Telemetry is collected by default, though it can be disabled.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
5
Star History
1,516 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
7 more.

firecrawl by mendableai

1.9%
44k
API service for turning websites into LLM-ready data
created 1 year ago
updated 1 day ago
Feedback? Help us improve.