crawlee  by apify

Web scraping/browser automation library for building reliable crawlers

created 9 years ago
18,621 stars

Top 2.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Crawlee is a comprehensive Node.js library for web scraping and browser automation, designed to build reliable and efficient crawlers. It targets developers needing to extract data from websites for AI, LLMs, RAG, or GPT applications, supporting various data formats and browser automation tools.

How It Works

Crawlee provides a unified interface for both HTTP and headless browser crawling, abstracting away complexities of tools like Playwright and Puppeteer. It features a persistent queue for managing URLs, pluggable storage for scraped data, and built-in proxy rotation and session management. This approach allows crawlers to mimic human behavior, bypass bot protections, and scale automatically.

Quick Start & Requirements

  • Install via npm: npm install crawlee playwright
  • Requires Node.js 16 or higher.
  • Full documentation: https://crawlee.dev/docs/introduction
  • CLI quick start: npx crawlee create my-crawler

Highlighted Details

  • Supports Playwright, Puppeteer, Cheerio, JSDOM, and raw HTTP.
  • Offers zero-config HTTP2, TLS fingerprint replication, and automatic browser management.
  • Features customizable lifecycles with hooks and configurable routing.
  • Includes ready-to-deploy Dockerfiles.

Maintenance & Community

  • Developed by Apify.
  • Support channels: GitHub Issues, Stack Overflow, GitHub Discussions, Discord server.
  • Contribution guidelines available in CONTRIBUTING.md.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Compatible with commercial use and closed-source projects.

Limitations & Caveats

Crawlee for Python is available for early adopters but is not the primary focus of this repository. The README mentions pre-release versions and potential dependency overrides if using the Apify SDK.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
36
Issues (30d)
45
Star History
1,127 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
7 more.

firecrawl by mendableai

1.9%
44k
API service for turning websites into LLM-ready data
created 1 year ago
updated 1 day ago
Feedback? Help us improve.