presidio  by microsoft

PII de-identification SDK for text and images

created 7 years ago
5,138 stars

Top 9.9% on sourcepulse

GitHubView on GitHub
Project Summary

Presidio is an open-source SDK for detecting, redacting, masking, and anonymizing Personally Identifiable Information (PII) across text, images, and structured data. It targets developers and organizations needing to protect sensitive data, offering a customizable and extensible framework for data privacy compliance.

How It Works

Presidio employs a modular architecture, featuring an Analyzer for PII detection and an Anonymizer for data transformation. The Analyzer supports a hybrid approach, combining Named Entity Recognition (NER) models, regular expressions, rule-based logic, and checksums, with options to integrate external PII detection models. This allows for context-aware identification of sensitive entities across multiple languages.

Quick Start & Requirements

  • Install: pip install presidio-analyzer presidio-anonymizer
  • Prerequisites: Python 3.7+, Docker for certain deployments.
  • Resources: Requires downloading language models for NER.
  • Docs: Full documentation
  • Demo: Demo
  • Examples: Examples

Highlighted Details

  • Supports PII detection and redaction in images, including DICOM medical images.
  • Offers Python, PySpark, Docker, and Kubernetes deployment options.
  • Highly customizable PII recognizers and de-identification pipelines.
  • Handles various PII types like credit card numbers, names, and phone numbers.

Maintenance & Community

  • Actively maintained by Microsoft.
  • Community discussions via GitHub discussions.
  • Bug reports and suggestions via GitHub issues.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

While Presidio automates PII detection, it does not guarantee the identification of all sensitive information, necessitating supplementary systems for comprehensive data protection.

Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
30
Issues (30d)
13
Star History
625 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.