Web scraping/browser automation library for building reliable crawlers
Top 2.5% on sourcepulse
Crawlee is a comprehensive Node.js library for web scraping and browser automation, designed to build reliable and efficient crawlers. It targets developers needing to extract data from websites for AI, LLMs, RAG, or GPT applications, supporting various data formats and browser automation tools.
How It Works
Crawlee provides a unified interface for both HTTP and headless browser crawling, abstracting away complexities of tools like Playwright and Puppeteer. It features a persistent queue for managing URLs, pluggable storage for scraped data, and built-in proxy rotation and session management. This approach allows crawlers to mimic human behavior, bypass bot protections, and scale automatically.
Quick Start & Requirements
npm install crawlee playwright
npx crawlee create my-crawler
Highlighted Details
Maintenance & Community
CONTRIBUTING.md
.Licensing & Compatibility
Limitations & Caveats
Crawlee for Python is available for early adopters but is not the primary focus of this repository. The README mentions pre-release versions and potential dependency overrides if using the Apify SDK.
1 day ago
Inactive