Self-hosted web scraper for data extraction via XPath
Top 11.9% on sourcepulse
Scraperr is a self-hosted web application designed for users to extract data from websites using XPath selectors. It offers a user-friendly interface for submitting URLs, defining scrape targets, managing past jobs, and downloading results, with optional AI integration for context-aware data analysis.
How It Works
Scraperr utilizes a queue-based system to manage scraping tasks, allowing users to submit multiple URLs and XPath queries. It supports scraping all pages within the same domain and allows custom JSON headers for requests. Results are displayed in a sortable table, with options to download as CSV and rerun jobs. The application also includes user management for organizing scraping activities and an API powered by FastAPI.
Quick Start & Requirements
make deps build up-dev
Highlighted Details
/docs
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
MongoDB 5.0+ requires AVX CPU support, which may cause issues in certain virtual machine configurations. Users must ensure compliance with target websites' robots.txt
and Terms of Service.
2 weeks ago
1 day