local_ai_ocr  by th1nhhdk

Local AI OCR for documents and images

Created 3 months ago
707 stars

Top 48.4% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a local, offline OCR solution for images and PDFs, leveraging the DeepSeek-OCR AI model. It targets users prioritizing data privacy and security, offering a portable, self-contained application that eliminates the need for internet connectivity after initial setup. The primary benefit is secure, on-device text extraction with flexible output formatting.

How It Works

The software utilizes the DeepSeek-OCR AI model, designed to run entirely on the user's machine. It automatically detects and utilizes available GPU resources (preferably Nvidia) for accelerated processing, falling back to CPU if a GPU is unavailable or insufficient. Data remains local, ensuring absolute privacy. It offers distinct processing modes: 'Markdown' aims to preserve document structure like tables, 'Free OCR' provides enhanced layout preservation, and 'Standard OCR' focuses on basic text extraction.

Quick Start & Requirements

Installation involves downloading a .zip release, extracting it, and running env_setup.cmd. This script downloads the ~6.67 GB AI model weights. System requirements recommend Windows 10+, a 4-core/8-thread CPU, 16GB RAM, ~11GB free disk space, and an Nvidia GPU with at least 8GB VRAM for optimal performance. Execution is handled via run.cmd (GPU/CPU) or run_cpu-only.cmd.

Highlighted Details

  • Fully offline operation ensures data privacy.
  • Supports GPU acceleration (Nvidia) with CPU fallback.
  • Processes various image formats (.png, .jpg, .webp, .heic, .heif) and PDFs.
  • Intelligent PDF page range selection and a queue system for batch processing.
  • Offers formatted output options, preserving layout for copy-pasting into applications like Word.
  • Includes a visual indicator showing the AI's current processing area.
  • Features automatic AI model unloading to free system memory.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were found in the provided README.

Licensing & Compatibility

The README does not specify the software's license or provide compatibility notes for commercial use or integration with closed-source projects.

Limitations & Caveats

The AI may occasionally enter infinite loops, requiring manual intervention. Users should verify OCR results, especially for critical documents. Initial model loading can cause delays. The order of files added via drag-and-drop is not preserved. Effective GPU utilization requires specific Nvidia driver versions (531+).

Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
86 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.