Discover and explore top open-source AI tools and projects—updated daily.
kotaro-kinoshitaAI engine for Japanese document image analysis
Top 30.2% on SourcePulse
YomiToku is an AI-powered document image analysis package specifically designed for the Japanese language. It addresses the challenge of extracting information from scanned documents and images by providing full-text OCR, layout analysis, and table structure recognition. This package is targeted at engineers, researchers, and power users needing to process Japanese documents accurately and efficiently, offering a benefit of transforming image-based documents into searchable and structured data.
How It Works
YomiToku employs four custom-trained AI models: character position detection, string recognition, layout analysis, and table structure recognition. These models are exclusively trained on Japanese document datasets, enabling high-precision inference for over 7000 Japanese characters, including handwritten text and vertical writing. The approach prioritizes preserving the semantic structure of documents during extraction, facilitated by layout and table structure analysis, and reading order estimation.
Quick Start & Requirements
pip install yomitoku (or pip install yomitoku[extract] for Extractor functionality).gallery.md. For detailed documentation, refer to the project's documentation pages.Highlighted Details
YomiToku Extractor for structured data extraction from forms and PDFs using rule-based or LLM-based methods.Maintenance & Community
Recent releases include v0.10.1 (CPU-optimized GPU Free OCR model support), v0.8.0 (handwritten text recognition support), and v0.5.1 (beta release). Further details and usage instructions can be found in the project's documentation.
Licensing & Compatibility
The source code and model weights are provided under the CC BY-NC-SA 4.0 license. This license permits non-commercial, personal, and research use. Commercial use requires obtaining a separate product license, available for on-premises/local PC deployment or via AWS Marketplace.
Limitations & Caveats
YomiToku is optimized for document OCR and is not designed for scene OCR (e.g., reading text on signs). Low-resolution input images may lead to decreased recognition accuracy. The lightweight model imposes a 50-character limit per line, making it less suitable for documents with extensive text on single lines.
6 days ago
Inactive
rednote-hilab