Caddy module to block AI training scrapers
Top 71.3% on sourcepulse
This Caddy module addresses the growing concern of AI services and cloud providers scraping websites for training data. It allows website administrators to block or manipulate traffic from known IP ranges associated with these services, thereby protecting their content and preventing data pollution. The target audience includes website owners, developers, and system administrators seeking to control access and deter unwanted automated traffic.
How It Works
Caddy Defender operates as Caddy middleware, inspecting incoming requests and matching client IP addresses against a configurable list of IP ranges. It leverages efficient IP matching, inspired by balanced ART routing table implementations, to quickly identify and act upon requests from specified sources. The module supports various response actions, including returning standard HTTP error codes (403), custom messages, dropping connections, sending garbage data to pollute AI training sets, or redirecting traffic.
Quick Start & Requirements
docker pull ghcr.io/jasonlovesdoggo/caddy-defender:latest
docker run -d --name caddy -v /path/to/Caddyfile:/etc/caddy/Caddyfile -p 80:80 -p 443:443 ghcr.io/jasonlovesdoggo/caddy-defender:latest
Highlighted Details
Maintenance & Community
The project is actively maintained, with contributions welcomed. Further details on contributing can be found in CONTRIBUTING.md
.
Licensing & Compatibility
Limitations & Caveats
The effectiveness of IP-based blocking relies on the accuracy and completeness of the IP range lists, which may require manual updates as services change their infrastructure. The "ratelimit" responder requires the caddy-ratelimit
plugin to be installed separately.
1 day ago
1 day