crawlee-python  by apify

Python library for web scraping and browser automation

Created 2 years ago
7,369 stars

Top 6.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Crawlee for Python is a comprehensive library for building reliable web scrapers and automating browser interactions. It targets developers needing to extract data for AI, LLMs, or RAG applications, offering a unified interface for both raw HTTP requests and headless browser automation, with built-in proxy rotation and robust error handling.

How It Works

Crawlee provides two primary crawler types: BeautifulSoupCrawler for efficient HTML parsing via HTTP requests, and PlaywrightCrawler for JavaScript-heavy sites using headless browsers. This dual approach allows users to select the most performant method for their specific needs. Its asynchronous, asyncio-based architecture and extensive configuration options enable fine-grained control over crawling behavior, retries, and data storage.

Quick Start & Requirements

Highlighted Details

  • Unified interface for HTTP and headless browser crawling.
  • Asyncio-based for high performance and compatibility.
  • Automatic retries, proxy rotation, and session management.
  • Configurable request routing and pluggable storage.

Maintenance & Community

  • Developed by Apify.
  • Support channels: GitHub Issues, Stack Overflow, GitHub Discussions, Discord server.
  • Contribution guidelines available in CONTRIBUTING.md.

Licensing & Compatibility

  • Licensed under Apache License 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The library is open to early adopters, suggesting potential for ongoing development and API changes. While it aims to bypass bot protections, effectiveness may vary against sophisticated anti-bot measures.

Health Check
Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
43
Issues (30d)
18
Star History
106 stars in the last 30 days

Explore Similar Projects

Starred by Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), Jeremy Howard Jeremy Howard(Cofounder of fast.ai), and
2 more.

trafilatura by adbar

0.6%
5k
Python package for web text extraction
Created 6 years ago
Updated 4 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Gregor Zunic Gregor Zunic(Cofounder of Browser Use), and
1 more.

suna by kortix-ai

0.5%
19k
Open-source AI agent for real-world task automation
Created 1 year ago
Updated 17 hours ago
Feedback? Help us improve.