pasteguard by sgasser

Privacy proxy for LLMs masking PII and secrets

Created 2 months ago

546 stars

Top 58.6% on SourcePulse

Project Summary

Summary

PasteGuard is an OpenAI-compatible privacy proxy designed to protect sensitive data when interacting with Large Language Models (LLMs). It addresses the critical need for data privacy by masking Personally Identifiable Information (PII) and secrets before they are sent to external LLM providers or by routing requests to local LLMs. This solution is ideal for developers and organizations needing to comply with data protection policies while leveraging powerful AI services.

How It Works

The core of PasteGuard is its role as an intermediary. It intercepts LLM API requests, analyzes them for PII (names, emails, phone numbers, etc.) and secrets (API keys, private keys) using Microsoft Presidio, and then applies one of two protection strategies. In "Mask Mode," detected sensitive data is replaced with placeholders before the request is forwarded to the LLM provider, with restoration occurring upon response. Alternatively, "Route Mode" directs requests containing PII to a locally hosted LLM (e.g., Ollama, vLLM), ensuring sensitive data never leaves the user's network. This proxy architecture allows seamless integration by simply changing the API endpoint URL.

Quick Start & Requirements

Installation: Clone the repository (git clone https://github.com/sgasser/pasteguard.git), navigate into the directory, copy the example configuration (cp config.example.yaml config.yaml), and run docker compose up -d.
Configuration: Modify config.yaml for specific settings.
Endpoint: Point your application to http://localhost:3000/openai/v1 instead of the original LLM API endpoint.
Dashboard: Access the real-time monitoring dashboard at http://localhost:3000/dashboard.
Prerequisites: Docker and Docker Compose are required.
Documentation: Official Documentation and Integrations links are available.

Highlighted Details

Comprehensive Detection: Identifies a wide range of PII (names, emails, phone numbers, credit cards, IBANs, IP addresses, locations) and secrets (API keys, private keys, tokens) via Microsoft Presidio.
Multi-language Support: Operates effectively across 24 languages.
Broad Compatibility: Fully OpenAI-compatible, working seamlessly with SDKs (Python, JS), LangChain, LlamaIndex, Cursor, and other OpenAI-compatible tools.
Real-time Features: Supports streaming requests and responses, with an integrated dashboard for monitoring protected requests.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or project roadmap were found in the provided README.

Licensing & Compatibility

License: Apache 2.0.
Compatibility: Permissive license suitable for commercial use and integration into closed-source applications. Fully compatible with any OpenAI-compatible tool.

Limitations & Caveats

The provided README does not detail specific limitations, known bugs, or alpha/beta status. The project appears to be presented as a stable, production-ready solution.

Health Check

Last Commit

21 hours ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

51 stars in the last 30 days