Discover and explore top open-source AI tools and projects—updated daily.
Self-hosted web archiver for preserving web content
Top 1.6% on SourcePulse
ArchiveBox is an open-source, self-hosted web archiving solution designed for individuals and organizations to preserve digital content. It captures web pages, social media posts, videos, and code repositories in durable, accessible formats, ensuring long-term data availability and user control.
How It Works
ArchiveBox leverages a suite of industry-standard tools like Chrome, wget
, and yt-dlp
to capture content. It saves snapshots in multiple redundant formats including HTML, PNG, PDF, WARC, and plain text. The system is designed for data longevity, storing information in ordinary files and folders, making it readable without ArchiveBox itself. It offers a modular architecture, allowing users to enable or disable specific extractors based on their needs.
Quick Start & Requirements
docker compose run archivebox init --setup
). Alternatives include plain Docker, pip install archivebox
, or an auto-setup script.init --setup
.Highlighted Details
yt-dlp
), and Git repositories.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
wget
and DOM to mitigate this.4 months ago
1 day