caddy-defender  by JasonLovesDoggo

Caddy module to block AI training scrapers

created 6 months ago
417 stars

Top 71.3% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This Caddy module addresses the growing concern of AI services and cloud providers scraping websites for training data. It allows website administrators to block or manipulate traffic from known IP ranges associated with these services, thereby protecting their content and preventing data pollution. The target audience includes website owners, developers, and system administrators seeking to control access and deter unwanted automated traffic.

How It Works

Caddy Defender operates as Caddy middleware, inspecting incoming requests and matching client IP addresses against a configurable list of IP ranges. It leverages efficient IP matching, inspired by balanced ART routing table implementations, to quickly identify and act upon requests from specified sources. The module supports various response actions, including returning standard HTTP error codes (403), custom messages, dropping connections, sending garbage data to pollute AI training sets, or redirecting traffic.

Quick Start & Requirements

  • Installation: docker pull ghcr.io/jasonlovesdoggo/caddy-defender:latest
  • Running: docker run -d --name caddy -v /path/to/Caddyfile:/etc/caddy/Caddyfile -p 80:80 -p 443:443 ghcr.io/jasonlovesdoggo/caddy-defender:latest
  • Prerequisites: Docker, Caddyfile configuration.
  • Documentation: Online documentation and Getting Started guide.

Highlighted Details

  • Includes predefined IP ranges for major AI services like OpenAI, GitHub Copilot, DeepSeek, and cloud providers (AWS, Azure, GCP).
  • Supports custom IP range definitions via Caddyfile.
  • Offers multiple responder backends: block, custom, drop, garbage, redirect, ratelimit, and tarpit.
  • Leverages efficient IP matching for high-performance filtering.

Maintenance & Community

The project is actively maintained, with contributions welcomed. Further details on contributing can be found in CONTRIBUTING.md.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

The effectiveness of IP-based blocking relies on the accuracy and completeness of the IP range lists, which may require manual updates as services change their infrastructure. The "ratelimit" responder requires the caddy-ratelimit plugin to be installed separately.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
0
Star History
54 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.