OpenOCR by Topdu

General OCR toolkit for research and applications

Created 2 years ago

1,404 stars

Top 28.1% on SourcePulse

Project Summary

Summary OpenOCR is a comprehensive open-source toolkit for general OCR research and applications, integrating a unified training/evaluation benchmark, commercial-grade OCR, and document parsing systems. Developed by Fudan University, it aims to bridge academic research with real-world industrial deployment for tasks including text detection, recognition, and document understanding.

How It Works The toolkit features specialized modules: OpenDoc-0.1B for document parsing (layout analysis + unified recognition), UniRec-0.1B for recognizing text, formulas, and mixed content with a 0.1B parameter model, and OpenOCR, a practical system built on SVTRv2 for general text detection/recognition. SVTRv2 serves as a benchmark for 24 scene text recognition methods, favoring CTC over encoder-decoder architectures and trained on large-scale real data for enhanced accuracy.

Quick Start & Requirements Quick start guides and local demos are available for OpenDoc-0.1B and OpenOCR. Pre-trained models are accessible via Hugging Face, ModelScope, and PaddleOCR implementations. Specific installation commands and detailed prerequisites (e.g., GPU, CUDA, Python versions) are not explicitly detailed in the provided text.

Highlighted Details Key features include OpenDoc-0.1B's high OmniDocBench score with 0.1B parameters, UniRec-0.1B's unified text/formula/table recognition with a compact 0.1B model, and OpenOCR's accuracy improvement over PP-OCRv4 with similar speed, plus ONNX export. The SVTRv2 benchmark enhances scene text recognition accuracy over synthetic data and reproduces numerous academic methods. Recent updates highlight new model releases and paper acceptances at top conferences.

Maintenance & Community Developed by the OCR team from FVL Lab, Fudan University. Specific contributors are listed for reproduced methods. No explicit community channels or roadmap links are provided.

Licensing & Compatibility The specific open-source license for the OpenOCR toolkit is not mentioned in the provided README content. Compatibility is enhanced through ONNX model export.

Limitations & Caveats Scene Text Detection (STD) and Text Spotting functionalities are marked as "TODO". Some academic method reproductions are pending completion. The code for the Complex Mathematical Expression Recognition (CMER) model is stated as "coming soon."

OpenOCR by Topdu

Explore Similar Projects

AWESOME-OCR-LLM by Yuliang-Liu

mindocr by mindspore-lab

awesome-ocr-resources by ZumingHuang

deepseek-ocr-client by ihatecsv

awesome-ocr by ChanChiChoi

DeepSeek-OCR-WebUI by neosun100

awesome-ocr by zacharywhitley

deepdoctection by deepdoctection

AdvancedLiterateMachinery by AlibabaResearch

DeepSeek-OCR-2 by deepseek-ai

awesome-ocr by wanghaisheng

surya by datalab-to