Folio-OCR by vorojar

Local batch OCR workbench for document digitization

Created 5 months ago

450 stars

Top 66.1% on SourcePulse

Project Summary

Summary Folio-OCR is an open-source, local batch OCR workbench designed as a free alternative to commercial solutions like ABBYY FineReader. It targets users digitizing books and documents, offering efficient, layout-aware processing and multiple export formats directly from a user-friendly interface.

How It Works This workbench leverages GLM-OCR and Ollama, featuring a distinctive three-panel editor for intuitive document processing. Its core innovation lies in layout detection, which automatically partitions documents and intelligently merges text regions, accelerating OCR by reducing redundant calls. The system also handles LaTeX special characters, converting them to Unicode, and performs automatic output cleanup.

Quick Start & Requirements Installation is streamlined via Docker: clone the repository, run docker compose up -d, and pull the glm-ocr model with docker compose exec ollama ollama pull glm-ocr. Local installation requires Python 3.10+ and Ollama, followed by pip install -r requirements.txt and python server.py. NVIDIA GPU acceleration is available by uncommenting a section in docker-compose.yml.

Highlighted Details

Batch processing with real-time progress, ETA, and interrupt functionality.
Edit/Preview modes with paragraph reflow and export to Markdown, TXT, and DOCX.
Data persistence via SQLite (folio_ocr.db), with auto-save and recovery of unsaved edits.
Three-panel UI (thumbnail, preview, OCR result) with SSE streaming, bidirectional highlighting, full-text search, and keyboard navigation.
Robust network fault tolerance, including request timeouts and non-blocking UI updates.

Maintenance & Community The project is hosted on GitHub at vorojar/Folio-OCR. No specific details regarding maintainers, community channels (e.g., Discord, Slack), or sponsorships were found in the provided README.

Licensing & Compatibility Folio-OCR is released under the MIT License, which is highly permissive and generally compatible with commercial use and closed-source projects.

Limitations & Caveats The initial model cold start time can be significant, around 50 seconds. Optimal performance, particularly the advertised ~0.5s/page, is contingent on having an NVIDIA GPU.

Folio-OCR by vorojar

Explore Similar Projects

filewizard by LoredCast

pdfmd by M1ck4

SmartResume by alibaba

api-llm-ocr by yigitkonur

vision-parse by iamarunbrahma

DeepSeek-OCR-Web by fufankeji

pdf-document-layout-analysis by huridocs

PolyglotPDF by CBIhalsen

OnnxOCR by jingsongliujing

text-extract-api by CatchTheTornado

xberg by xberg-io

surya by datalab-to