yomitoku by kotaro-kinoshita

AI engine for Japanese document image analysis

Created 1 year ago

1,537 stars

Top 26.1% on SourcePulse

Project Summary

YomiToku is an AI-powered document image analysis package specifically designed for the Japanese language. It addresses the challenge of extracting information from scanned documents and images by providing full-text OCR, layout analysis, and table structure recognition. This package is targeted at engineers, researchers, and power users needing to process Japanese documents accurately and efficiently, offering a benefit of transforming image-based documents into searchable and structured data.

How It Works

YomiToku employs four custom-trained AI models: character position detection, string recognition, layout analysis, and table structure recognition. These models are exclusively trained on Japanese document datasets, enabling high-precision inference for over 7000 Japanese characters, including handwritten text and vertical writing. The approach prioritizes preserving the semantic structure of documents during extraction, facilitated by layout and table structure analysis, and reading order estimation.

Quick Start & Requirements

Install: pip install yomitoku (or pip install yomitoku[extract] for Extractor functionality).
Prerequisites: PyTorch (>= 2.5) with CUDA (>= 11.8, recommended 12.4+). A GPU with 8GB VRAM is recommended for optimal performance, though CPU inference is supported with the lightweight model. Input images should have a minimum resolution with the short side at 720px for better accuracy.
Links: Demo results are available in gallery.md. For detailed documentation, refer to the project's documentation pages.

Highlighted Details

Supports OCR, layout analysis, table structure recognition, and reading order estimation for Japanese documents.
Outputs can be converted to HTML, Markdown, JSON, CSV, and searchable PDF formats.
Includes functionality to extract figures and images embedded within documents.
Features YomiToku Extractor for structured data extraction from forms and PDFs using rule-based or LLM-based methods.
Offers a lightweight model for faster CPU inference, with a limitation on line character count.

Maintenance & Community

Recent releases include v0.10.1 (CPU-optimized GPU Free OCR model support), v0.8.0 (handwritten text recognition support), and v0.5.1 (beta release). Further details and usage instructions can be found in the project's documentation.

Licensing & Compatibility

The source code and model weights are provided under the CC BY-NC-SA 4.0 license. This license permits non-commercial, personal, and research use. Commercial use requires obtaining a separate product license, available for on-premises/local PC deployment or via AWS Marketplace.

Limitations & Caveats

YomiToku is optimized for document OCR and is not designed for scene OCR (e.g., reading text on signs). Low-resolution input images may lead to decreased recognition accuracy. The lightweight model imposes a 50-character limit per line, making it less suitable for documents with extensive text on single lines.

yomitoku by kotaro-kinoshita

Explore Similar Projects

ferrules by AmineDiro

Qianfan-VL by baidubce

Versatile-OCR-Program by raphael-seo

SmartResume by alibaba

DeepSeek-OCR-WebUI by neosun100

awesome-ocr by zacharywhitley

HunyuanOCR by Tencent-Hunyuan

OnnxOCR by jingsongliujing

liteparse by run-llama

dots.ocr by rednote-hilab

chandra by datalab-to

surya by datalab-to