firecrawl  by mendableai

API service for turning websites into LLM-ready data

created 1 year ago
43,866 stars

Top 0.6% on sourcepulse

GitHubView on GitHub
Project Summary

Firecrawl provides an API service for scraping, crawling, and extracting data from websites, transforming it into LLM-ready formats like Markdown or structured JSON. It targets developers building AI applications who need to ingest web content efficiently, offering advanced capabilities to handle dynamic content, anti-bot measures, and custom extraction logic.

How It Works

Firecrawl utilizes a robust scraping engine capable of handling JavaScript-rendered content and complex website structures. It offers distinct functionalities: scrape for single URLs, crawl for recursively exploring subpages, map for discovering all URLs on a site, and search for web searches with content retrieval. The extract feature leverages LLMs to parse specific data points from scraped content, supporting both predefined schemas and natural language prompts for flexible data structuring.

Quick Start & Requirements

  • API Usage: Sign up at firecrawl.dev for an API key.
  • Local Self-Hosting: Refer to the self-hosting guide.
  • SDKs: Python (pip install firecrawl-py), Node.js (npm install @mendable/firecrawl-js).
  • Dependencies: None explicitly stated for API usage. Local hosting requirements are detailed in the self-hosting guide.

Highlighted Details

  • Supports scraping, crawling, mapping website URLs, and web searching.
  • Extracts data into LLM-ready formats: Markdown, structured JSON, screenshots, HTML, and links.
  • Handles dynamic content, proxies, and anti-bot mechanisms.
  • Offers "Actions" for interacting with pages (click, scroll, write, press) before scraping (cloud-only).
  • Includes batch scraping for processing multiple URLs concurrently.

Maintenance & Community

  • Actively developed by Mendable AI.
  • Community channels and contribution guides are available.
  • Requests for new SDKs or integrations can be made via issues.

Licensing & Compatibility

  • Core project licensed under AGPL-3.0.
  • SDKs and some UI components are licensed under MIT.
  • AGPL-3.0's copyleft provisions may impact integration into closed-source commercial applications.

Limitations & Caveats

  • The repository is in development, with full self-hosted deployment not yet ready, though local execution is possible.
  • Advanced "Actions" for page interaction are noted as cloud-only.
  • Users are responsible for adhering to website scraping policies and robots.txt.
Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
124
Issues (30d)
27
Star History
7,173 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.