Facepager  by strohne

Automated web data retrieval and extraction

Created 13 years ago
541 stars

Top 58.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Facepager is a tool for automated data retrieval from websites and APIs like YouTube, Twitter, and knowledge infrastructure sources. It simplifies complex data collection tasks, including multi-threading, rate limits, and pagination, benefiting researchers and power users by efficiently gathering and exporting public data.

How It Works

Facepager automates online data collection via APIs and web scraping, managing multi-threaded operations, rate limits, pagination, and data extraction. Data is stored in SQLite and exportable to CSV. It offers presets for various sources and allows custom pipelines for targeted data collection, including cloud services.

Quick Start & Requirements

  • Installation: Windows (.exe installer), macOS (.pkg, requires security adjustments), Linux (build from source per src/readme.md).
  • Prerequisites: Users need to provide their own API credentials for Facebook, Twitter, and YouTube due to API changes.
  • Guidance: Presets are available; detailed guidance is on the Wiki and YouTube tutorials.

Highlighted Details

  • Focuses on knowledge infrastructure APIs (Open Library, OpenAlex, Crossref, arxiv.org, Wikidata, Wikipedia, etc.) and cloud services (OpenAI, AWS).
  • Handles multi-threaded collection, rate limits, pagination, and data extraction.
  • Stores data in SQLite, exportable to CSV.

Maintenance & Community

Help is available via the Facepager Usergroup on Facebook. Updates are announced on the Facepager Facebook Page.

Licensing & Compatibility

Distributed under the permissive MIT License, allowing commercial use, modification, and distribution with attribution.

Limitations & Caveats

Official Facebook, Twitter, and YouTube API support is limited; users must obtain and configure their own API keys. Database files may not be compatible across versions.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.