local_ai_ocr by th1nhhdk

Local AI OCR for documents and images

Created 7 months ago

765 stars

Top 44.8% on SourcePulse

Project Summary

This project provides a local, offline OCR solution for images and PDFs, leveraging the DeepSeek-OCR AI model. It targets users prioritizing data privacy and security, offering a portable, self-contained application that eliminates the need for internet connectivity after initial setup. The primary benefit is secure, on-device text extraction with flexible output formatting.

How It Works

The software utilizes the DeepSeek-OCR AI model, designed to run entirely on the user's machine. It automatically detects and utilizes available GPU resources (preferably Nvidia) for accelerated processing, falling back to CPU if a GPU is unavailable or insufficient. Data remains local, ensuring absolute privacy. It offers distinct processing modes: 'Markdown' aims to preserve document structure like tables, 'Free OCR' provides enhanced layout preservation, and 'Standard OCR' focuses on basic text extraction.

Quick Start & Requirements

Installation involves downloading a .zip release, extracting it, and running env_setup.cmd. This script downloads the ~6.67 GB AI model weights. System requirements recommend Windows 10+, a 4-core/8-thread CPU, 16GB RAM, ~11GB free disk space, and an Nvidia GPU with at least 8GB VRAM for optimal performance. Execution is handled via run.cmd (GPU/CPU) or run_cpu-only.cmd.

Highlighted Details

Fully offline operation ensures data privacy.
Supports GPU acceleration (Nvidia) with CPU fallback.
Processes various image formats (.png, .jpg, .webp, .heic, .heif) and PDFs.
Intelligent PDF page range selection and a queue system for batch processing.
Offers formatted output options, preserving layout for copy-pasting into applications like Word.
Includes a visual indicator showing the AI's current processing area.
Features automatic AI model unloading to free system memory.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were found in the provided README.

Licensing & Compatibility

The README does not specify the software's license or provide compatibility notes for commercial use or integration with closed-source projects.

Limitations & Caveats

The AI may occasionally enter infinite loops, requiring manual intervention. Users should verify OCR results, especially for critical documents. Initial model loading can cause delays. The order of files added via drag-and-drop is not preserved. Effective GPU utilization requires specific Nvidia driver versions (531+).

Health Check

Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

23 stars in the last 30 days