Discover and explore top open-source AI tools and projects—updated daily.
zeroc00ILLM data anonymization proxy for secure penetration testing
New!
Top 85.3% on SourcePulse
This project provides a transparent reverse proxy for Claude Code, designed to anonymize sensitive penetration testing data (like IPs, hashes, credentials, hostnames, and PII) before it's sent to the Anthropic API. It targets security professionals and researchers using LLMs for tasks involving client data, offering a robust solution to maintain data privacy and compliance during engagements. The primary benefit is enabling the use of powerful AI tools like Claude Code on sensitive pentest data without direct exposure of client-specific information.
How It Works
The system operates as an invisible proxy sitting between Claude Code and the Anthropic API. It intercepts all outgoing data—including bash command outputs, file reads, and grep results—identifying and replacing sensitive information with realistic-looking surrogates. This anonymization is achieved through a dual-layer detection mechanism: a local Ollama LLM (e.g., qwen3:4b) for context-aware entities like hostnames, usernames, and credentials, and a deterministic regex safety net for patterns such as IPs, CIDRs, hashes, and API keys. Mappings between original data and surrogates are stored persistently in a per-engagement SQLite vault, ensuring consistency and preventing collisions within a client's scope. Responses from Anthropic are de-anonymized using these mappings before being presented to Claude Code, guaranteeing that the LLM never processes actual client data.
Quick Start & Requirements
./scripts/setup.sh, ollama pull qwen3:1.7b, then ./scripts/run.sh for the proxy and claude for the client.make docker-up for a containerized setup (CPU only).qwen3:1.7b or qwen3:4b), Docker (for Option C).qwen3:1.7b) upwards. Setup involves script execution and model downloads.Highlighted Details
scripts/auto_improve.py) for continuous enhancement of anonymization coverage, aiming for a 0% leak policy.Maintenance & Community
No specific details regarding maintainers, sponsorships, or community channels (like Discord or Slack) are provided in the README.
Licensing & Compatibility
The README does not specify a software license. Consequently, compatibility for commercial use or closed-source linking cannot be determined from the provided documentation.
Limitations & Caveats
The regex layer may miss context-dependent data like bare hostnames or unusual password formats, making the LLM layer essential. Very dense LLM outputs exceeding LLM_CHUNK_SIZE (default 1500 chars) might lose context at chunk boundaries. The system offers no provable privacy guarantee against metadata or writing style correlation attacks. There is a low, non-zero risk of surrogate collision if different original data maps to the same surrogate, though the per-engagement vault mitigates this within a session. This tool is not a substitute for reviewing NDAs and contracts regarding the use of cloud AI services.
1 day ago
Inactive
protectai
protectai