Python library for web scraping and browser automation
Top 8.6% on sourcepulse
Crawlee for Python is a comprehensive library for building reliable web scrapers and automating browser interactions. It targets developers needing to extract data for AI, LLMs, or RAG applications, offering a unified interface for both raw HTTP requests and headless browser automation, with built-in proxy rotation and robust error handling.
How It Works
Crawlee provides two primary crawler types: BeautifulSoupCrawler
for efficient HTML parsing via HTTP requests, and PlaywrightCrawler
for JavaScript-heavy sites using headless browsers. This dual approach allows users to select the most performant method for their specific needs. Its asynchronous, asyncio-based architecture and extensive configuration options enable fine-grained control over crawling behavior, retries, and data storage.
Quick Start & Requirements
python -m pip install 'crawlee[all]'
playwright install
python -c 'import crawlee; print(crawlee.__version__)'
Highlighted Details
Maintenance & Community
CONTRIBUTING.md
.Licensing & Compatibility
Limitations & Caveats
The library is open to early adopters, suggesting potential for ongoing development and API changes. While it aims to bypass bot protections, effectiveness may vary against sophisticated anti-bot measures.
1 day ago
1 day