caddy-defender by JasonLovesDoggo

Caddy module to block AI training scrapers

Created 9 months ago

448 stars

Top 67.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Matt Holt

Author of Caddy

Project Summary

This Caddy module addresses the growing concern of AI services and cloud providers scraping websites for training data. It allows website administrators to block or manipulate traffic from known IP ranges associated with these services, thereby protecting their content and preventing data pollution. The target audience includes website owners, developers, and system administrators seeking to control access and deter unwanted automated traffic.

How It Works

Caddy Defender operates as Caddy middleware, inspecting incoming requests and matching client IP addresses against a configurable list of IP ranges. It leverages efficient IP matching, inspired by balanced ART routing table implementations, to quickly identify and act upon requests from specified sources. The module supports various response actions, including returning standard HTTP error codes (403), custom messages, dropping connections, sending garbage data to pollute AI training sets, or redirecting traffic.

Quick Start & Requirements

Installation: docker pull ghcr.io/jasonlovesdoggo/caddy-defender:latest
Running: docker run -d --name caddy -v /path/to/Caddyfile:/etc/caddy/Caddyfile -p 80:80 -p 443:443 ghcr.io/jasonlovesdoggo/caddy-defender:latest
Prerequisites: Docker, Caddyfile configuration.
Documentation: Online documentation and Getting Started guide.

Highlighted Details

Includes predefined IP ranges for major AI services like OpenAI, GitHub Copilot, DeepSeek, and cloud providers (AWS, Azure, GCP).
Supports custom IP range definitions via Caddyfile.
Offers multiple responder backends: block, custom, drop, garbage, redirect, ratelimit, and tarpit.
Leverages efficient IP matching for high-performance filtering.

Maintenance & Community

The project is actively maintained, with contributions welcomed. Further details on contributing can be found in CONTRIBUTING.md.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive MIT license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

The effectiveness of IP-based blocking relies on the accuracy and completeness of the IP range lists, which may require manual updates as services change their infrastructure. The "ratelimit" responder requires the caddy-ratelimit plugin to be installed separately.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days