unredact by Alex-Gilbert

Guess redacted text in PDFs using AI and constraint solving

Created 4 months ago

313 stars

Top 86.0% on SourcePulse

Project Summary

A browser-based research tool, Unredact generates plausible guesses for redacted text in PDFs by combining OCR, font-aware constraint solving, and LLM reasoning. It targets users needing to explore or recover potentially hidden information, offering a novel approach that operates entirely client-side without server dependencies, thereby enhancing privacy and accessibility.

How It Works

The system processes PDFs by rasterizing them, then uses Tesseract.js for OCR and a WASM module for redaction detection and heuristic font/size identification. A WASM-compiled constraint solver enumerates candidate strings that match the redacted region's pixel width, leveraging detected font metrics. Finally, Claude API (or a similar LLM) scores these candidates for contextual plausibility with surrounding text, and results are ranked by a composite score.

Quick Start & Requirements

Live Demo: Available at unredact.live (requires a Claude API key).
Local Setup: Clone the repository, build the static site using make build-static (requires Rust toolchain and wasm-pack), then serve locally with make serve-static.
Prerequisites: Rust toolchain (for building), Claude API key (for LLM scoring).

Highlighted Details

Client-Side Execution: Operates entirely within the browser as a static site, eliminating server dependencies.
Multi-Modal Guessing: Supports various "solve modes" including general words, names, emails, and full names (with optional custom person databases).
Visual Verification: Allows users to overlay candidate text onto the original document for visual alignment checks.
Probabilistic Approach: Employs heuristic font detection and approximate pixel-width matching for candidate generation.

Maintenance & Community

No specific details regarding active maintenance, community channels (e.g., Discord, Slack), or formal support structures were found in the provided README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool provides probabilistic guesses, not verified facts, with accuracy contingent on heuristic font detection and approximate width calculations. It is intended for research and entertainment, explicitly cautioned against use in legal, journalistic, or law enforcement contexts due to potential inaccuracies and the risk of circumventing lawful redactions.

unredact by Alex-Gilbert

Explore Similar Projects

thesis-docx by the-shy123456

ImBD by Jiaqi-Chen-00

epstein-docs.github.io by epstein-docs

De-AI-Prompt-Enhancer-Writer-Booster-SKILL by OUBIGFA

detect-gpt by eric-mitchell

WritingTools by theJayTea

paperless-gpt by icereed

deepdoctection by deepdoctection

AdvancedLiterateMachinery by AlibabaResearch

nlm-ingestor by nlmatics

awesome-deep-text-detection-recognition by hwalsuklee

liteparse by run-llama