top_secret  by thoughtbot

Filter sensitive text for LLM/API integrations

Created 2 months ago
268 stars

Top 95.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This library addresses the critical need to filter sensitive information from free text before transmitting it to external services, particularly chatbots and LLMs. It offers developers a robust solution for data privacy and security, enabling the redaction of credit card numbers, emails, phone numbers, names, and locations, with the flexibility to add custom filters.

How It Works

Top Secret employs a dual-pronged approach using Regex filters for structured data patterns and NER (Named Entity Recognition) filters powered by the MITIE library for identifying entities like people and locations in free-form text. NER filters leverage trained language models and confidence scores, while regex filters rely on pattern matching. The system allows for customization by overriding, disabling, or adding new filters, and features batch processing for consistent redaction labels across multiple messages.

Quick Start & Requirements

  • Installation: Install via RubyGems (gem install top_secret) or add to your Gemfile (bundle add top_secret).
  • Prerequisites: Requires MITIE Ruby, which in turn depends on MITIE. Users must download and extract the ner_model.dat file.
  • Configuration: The model_path can be configured in TopSecret.configure or set to nil to disable NER filtering, improving performance and removing the model file dependency. The ner_model.dat file is large and should not be committed to version control.
  • Links: Contribution details are available on GitHub.

Highlighted Details

  • Batch Processing: The filter_all method ensures identical sensitive data receives consistent redaction labels across multiple processed messages.
  • Restoration: Includes TopSecret::FilteredText.restore to substitute redaction placeholders in responses (e.g., from LLMs) back to their original values using a provided mapping.
  • LLM Integration: Provides guidance on instructing LLMs to reference filtered data by its placeholder, facilitating the restoration process.
  • Customization: Supports adding custom regex and NER filters, and overriding or disabling default filters on a per-call or global basis.

Maintenance & Community

This project is maintained and funded by thoughtbot, inc. Contributions are welcomed via GitHub, and a Code of Conduct is in place to foster a safe and collaborative environment.

Licensing & Compatibility

The software is described as "free software that may be redistributed under the terms specified in the LICENSE file." Specific license details or compatibility notes for commercial use or closed-source linking are not explicitly detailed in the provided text.

Limitations & Caveats

The primary dependency on the MITIE library and the large ner_model.dat file can pose challenges for deployment and resource management. While NER filtering can be disabled, this removes the capability to detect names and locations. Custom NER filters require users to train their own MITIE models, and default NER support is limited to :person and :location tags.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
1
Star History
22 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Sasha Rush Sasha Rush(Research Scientist at Cursor; Professor at Cornell Tech), and
2 more.

llmparser by kyang6

0%
426
LLM tool for structured data extraction and classification
Created 2 years ago
Updated 2 years ago
Starred by Kaichao You Kaichao You(Core Maintainer of vLLM), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

lm-format-enforcer by noamgat

0.3%
2k
Format enforcer for language model outputs (JSON, regex, etc.)
Created 2 years ago
Updated 1 month ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
1 more.

Curator by NVIDIA-NeMo

0.3%
1k
Data curation toolkit for LLMs
Created 1 year ago
Updated 13 hours ago
Feedback? Help us improve.