Folio-OCR  by vorojar

Local batch OCR workbench for document digitization

Created 2 months ago
331 stars

Top 82.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary Folio-OCR is an open-source, local batch OCR workbench designed as a free alternative to commercial solutions like ABBYY FineReader. It targets users digitizing books and documents, offering efficient, layout-aware processing and multiple export formats directly from a user-friendly interface.

How It Works This workbench leverages GLM-OCR and Ollama, featuring a distinctive three-panel editor for intuitive document processing. Its core innovation lies in layout detection, which automatically partitions documents and intelligently merges text regions, accelerating OCR by reducing redundant calls. The system also handles LaTeX special characters, converting them to Unicode, and performs automatic output cleanup.

Quick Start & Requirements Installation is streamlined via Docker: clone the repository, run docker compose up -d, and pull the glm-ocr model with docker compose exec ollama ollama pull glm-ocr. Local installation requires Python 3.10+ and Ollama, followed by pip install -r requirements.txt and python server.py. NVIDIA GPU acceleration is available by uncommenting a section in docker-compose.yml.

Highlighted Details

  • Batch processing with real-time progress, ETA, and interrupt functionality.
  • Edit/Preview modes with paragraph reflow and export to Markdown, TXT, and DOCX.
  • Data persistence via SQLite (folio_ocr.db), with auto-save and recovery of unsaved edits.
  • Three-panel UI (thumbnail, preview, OCR result) with SSE streaming, bidirectional highlighting, full-text search, and keyboard navigation.
  • Robust network fault tolerance, including request timeouts and non-blocking UI updates.

Maintenance & Community The project is hosted on GitHub at vorojar/Folio-OCR. No specific details regarding maintainers, community channels (e.g., Discord, Slack), or sponsorships were found in the provided README.

Licensing & Compatibility Folio-OCR is released under the MIT License, which is highly permissive and generally compatible with commercial use and closed-source projects.

Limitations & Caveats The initial model cold start time can be significant, around 50 seconds. Optimal performance, particularly the advertised ~0.5s/page, is contingent on having an NVIDIA GPU.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
254 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.