unredact  by Alex-Gilbert

Guess redacted text in PDFs using AI and constraint solving

Created 1 month ago
310 stars

Top 86.8% on SourcePulse

GitHubView on GitHub
Project Summary

A browser-based research tool, Unredact generates plausible guesses for redacted text in PDFs by combining OCR, font-aware constraint solving, and LLM reasoning. It targets users needing to explore or recover potentially hidden information, offering a novel approach that operates entirely client-side without server dependencies, thereby enhancing privacy and accessibility.

How It Works

The system processes PDFs by rasterizing them, then uses Tesseract.js for OCR and a WASM module for redaction detection and heuristic font/size identification. A WASM-compiled constraint solver enumerates candidate strings that match the redacted region's pixel width, leveraging detected font metrics. Finally, Claude API (or a similar LLM) scores these candidates for contextual plausibility with surrounding text, and results are ranked by a composite score.

Quick Start & Requirements

  • Live Demo: Available at unredact.live (requires a Claude API key).
  • Local Setup: Clone the repository, build the static site using make build-static (requires Rust toolchain and wasm-pack), then serve locally with make serve-static.
  • Prerequisites: Rust toolchain (for building), Claude API key (for LLM scoring).

Highlighted Details

  • Client-Side Execution: Operates entirely within the browser as a static site, eliminating server dependencies.
  • Multi-Modal Guessing: Supports various "solve modes" including general words, names, emails, and full names (with optional custom person databases).
  • Visual Verification: Allows users to overlay candidate text onto the original document for visual alignment checks.
  • Probabilistic Approach: Employs heuristic font detection and approximate pixel-width matching for candidate generation.

Maintenance & Community

No specific details regarding active maintenance, community channels (e.g., Discord, Slack), or formal support structures were found in the provided README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool provides probabilistic guesses, not verified facts, with accuracy contingent on heuristic font detection and approximate width calculations. It is intended for research and entertainment, explicitly cautioned against use in legal, journalistic, or law enforcement contexts due to potential inaccuracies and the risk of circumventing lawful redactions.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
21 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.