crawlee  by apify

Web scraping/browser automation library for building reliable crawlers

Created 9 years ago
20,992 stars

Top 2.1% on SourcePulse

GitHubView on GitHub
Project Summary

Crawlee is a comprehensive Node.js library for web scraping and browser automation, designed to build reliable and efficient crawlers. It targets developers needing to extract data from websites for AI, LLMs, RAG, or GPT applications, supporting various data formats and browser automation tools.

How It Works

Crawlee provides a unified interface for both HTTP and headless browser crawling, abstracting away complexities of tools like Playwright and Puppeteer. It features a persistent queue for managing URLs, pluggable storage for scraped data, and built-in proxy rotation and session management. This approach allows crawlers to mimic human behavior, bypass bot protections, and scale automatically.

Quick Start & Requirements

  • Install via npm: npm install crawlee playwright
  • Requires Node.js 16 or higher.
  • Full documentation: https://crawlee.dev/docs/introduction
  • CLI quick start: npx crawlee create my-crawler

Highlighted Details

  • Supports Playwright, Puppeteer, Cheerio, JSDOM, and raw HTTP.
  • Offers zero-config HTTP2, TLS fingerprint replication, and automatic browser management.
  • Features customizable lifecycles with hooks and configurable routing.
  • Includes ready-to-deploy Dockerfiles.

Maintenance & Community

  • Developed by Apify.
  • Support channels: GitHub Issues, Stack Overflow, GitHub Discussions, Discord server.
  • Contribution guidelines available in CONTRIBUTING.md.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Compatible with commercial use and closed-source projects.

Limitations & Caveats

Crawlee for Python is available for early adopters but is not the primary focus of this repository. The README mentions pre-release versions and potential dependency overrides if using the Apify SDK.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
24
Issues (30d)
25
Star History
227 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Gregor Zunic Gregor Zunic(Cofounder of Browser Use), and
1 more.

suna by kortix-ai

0.5%
19k
Open-source AI agent for real-world task automation
Created 1 year ago
Updated 12 hours ago
Feedback? Help us improve.