Discover and explore top open-source AI tools and projects—updated daily.
TracyWang95Local-first data redaction for documents and images
Top 74.9% on SourcePulse
RedactionEverything addresses the critical need for local-first, privacy-preserving redaction of sensitive information across diverse unstructured data formats including documents, scanned PDFs, images, and text. It targets engineers, researchers, and power users who require robust data anonymization without relying on external APIs, offering a comprehensive workbench for finding, reviewing, and redacting sensitive content. The primary benefit is enhanced data security and compliance by keeping all processing within a local or intranet environment.
How It Works
The system employs a local-first architecture, splitting processing into distinct text and vision pipelines. Textual data is handled by HaS Text semantic NER, with regex as a fallback. The vision pipeline integrates OCR for text extraction from images and scanned documents, HaS Image YOLO for detecting visual privacy regions (faces, seals, etc.), and a VLM for more nuanced visual-semantic tasks like signature detection. It supports configurable schemas, including general, legal, finance, and healthcare presets, enabling domain-specific redaction. The workflow encompasses recognition, human review, redaction, and export, providing a complete anonymization solution.
Quick Start & Requirements
npm run dev (Windows + WSL).Highlighted Details
Maintenance & Community
The project includes CI via GitHub Actions and welcomes pull requests. No specific community channels (e.g., Discord, Slack) or roadmap links are detailed in the README.
Licensing & Compatibility
The project is released under a custom "Personal Use License." This license permits free use for individuals for personal, non-commercial purposes. Commercial use, including by companies, institutions, or for production deployments, requires a separate commercial license.
Limitations & Caveats
The full vision pipeline, particularly VLM-based signature detection, requires a recommended 16 GB of VRAM; systems with less may experience performance degradation. The VLM is a complementary layer to YOLO and not a direct replacement. The project prioritizes practical local deployment over utilizing the largest possible models.
6 days ago
Inactive