Discover and explore top open-source AI tools and projects—updated daily.
Alex-GilbertGuess redacted text in PDFs using AI and constraint solving
Top 86.8% on SourcePulse
A browser-based research tool, Unredact generates plausible guesses for redacted text in PDFs by combining OCR, font-aware constraint solving, and LLM reasoning. It targets users needing to explore or recover potentially hidden information, offering a novel approach that operates entirely client-side without server dependencies, thereby enhancing privacy and accessibility.
How It Works
The system processes PDFs by rasterizing them, then uses Tesseract.js for OCR and a WASM module for redaction detection and heuristic font/size identification. A WASM-compiled constraint solver enumerates candidate strings that match the redacted region's pixel width, leveraging detected font metrics. Finally, Claude API (or a similar LLM) scores these candidates for contextual plausibility with surrounding text, and results are ranked by a composite score.
Quick Start & Requirements
unredact.live (requires a Claude API key).make build-static (requires Rust toolchain and wasm-pack), then serve locally with make serve-static.Highlighted Details
Maintenance & Community
No specific details regarding active maintenance, community channels (e.g., Discord, Slack), or formal support structures were found in the provided README.
Licensing & Compatibility
Limitations & Caveats
The tool provides probabilistic guesses, not verified facts, with accuracy contingent on heuristic font detection and approximate width calculations. It is intended for research and entertainment, explicitly cautioned against use in legal, journalistic, or law enforcement contexts due to potential inaccuracies and the risk of circumventing lawful redactions.
1 month ago
Inactive