Discover and explore top open-source AI tools and projects—updated daily.
thoughtbotFilter sensitive text for LLM/API integrations
Top 89.8% on SourcePulse
This library addresses the critical need to filter sensitive information from free text before transmitting it to external services, particularly chatbots and LLMs. It offers developers a robust solution for data privacy and security, enabling the redaction of credit card numbers, emails, phone numbers, names, and locations, with the flexibility to add custom filters.
How It Works
Top Secret employs a dual-pronged approach using Regex filters for structured data patterns and NER (Named Entity Recognition) filters powered by the MITIE library for identifying entities like people and locations in free-form text. NER filters leverage trained language models and confidence scores, while regex filters rely on pattern matching. The system allows for customization by overriding, disabling, or adding new filters, and features batch processing for consistent redaction labels across multiple messages.
Quick Start & Requirements
gem install top_secret) or add to your Gemfile (bundle add top_secret).ner_model.dat file.model_path can be configured in TopSecret.configure or set to nil to disable NER filtering, improving performance and removing the model file dependency. The ner_model.dat file is large and should not be committed to version control.Highlighted Details
filter_all method ensures identical sensitive data receives consistent redaction labels across multiple processed messages.TopSecret::FilteredText.restore to substitute redaction placeholders in responses (e.g., from LLMs) back to their original values using a provided mapping.Maintenance & Community
This project is maintained and funded by thoughtbot, inc. Contributions are welcomed via GitHub, and a Code of Conduct is in place to foster a safe and collaborative environment.
Licensing & Compatibility
The software is described as "free software that may be redistributed under the terms specified in the LICENSE file." Specific license details or compatibility notes for commercial use or closed-source linking are not explicitly detailed in the provided text.
Limitations & Caveats
The primary dependency on the MITIE library and the large ner_model.dat file can pose challenges for deployment and resource management. While NER filtering can be disabled, this removes the capability to detect names and locations. Custom NER filters require users to train their own MITIE models, and default NER support is limited to :person and :location tags.
3 weeks ago
Inactive
kyang6
noamgat
NVIDIA-NeMo