OpenOCR  by Topdu

General OCR toolkit for research and applications

Created 1 year ago
985 stars

Top 37.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary OpenOCR is a comprehensive open-source toolkit for general OCR research and applications, integrating a unified training/evaluation benchmark, commercial-grade OCR, and document parsing systems. Developed by Fudan University, it aims to bridge academic research with real-world industrial deployment for tasks including text detection, recognition, and document understanding.

How It Works The toolkit features specialized modules: OpenDoc-0.1B for document parsing (layout analysis + unified recognition), UniRec-0.1B for recognizing text, formulas, and mixed content with a 0.1B parameter model, and OpenOCR, a practical system built on SVTRv2 for general text detection/recognition. SVTRv2 serves as a benchmark for 24 scene text recognition methods, favoring CTC over encoder-decoder architectures and trained on large-scale real data for enhanced accuracy.

Quick Start & Requirements Quick start guides and local demos are available for OpenDoc-0.1B and OpenOCR. Pre-trained models are accessible via Hugging Face, ModelScope, and PaddleOCR implementations. Specific installation commands and detailed prerequisites (e.g., GPU, CUDA, Python versions) are not explicitly detailed in the provided text.

Highlighted Details Key features include OpenDoc-0.1B's high OmniDocBench score with 0.1B parameters, UniRec-0.1B's unified text/formula/table recognition with a compact 0.1B model, and OpenOCR's accuracy improvement over PP-OCRv4 with similar speed, plus ONNX export. The SVTRv2 benchmark enhances scene text recognition accuracy over synthetic data and reproduces numerous academic methods. Recent updates highlight new model releases and paper acceptances at top conferences.

Maintenance & Community Developed by the OCR team from FVL Lab, Fudan University. Specific contributors are listed for reproduced methods. No explicit community channels or roadmap links are provided.

Licensing & Compatibility The specific open-source license for the OpenOCR toolkit is not mentioned in the provided README content. Compatibility is enhanced through ONNX model export.

Limitations & Caveats Scene Text Detection (STD) and Text Spotting functionalities are marked as "TODO". Some academic method reproductions are pending completion. The code for the Complex Mathematical Expression Recognition (CMER) model is stated as "coming soon."

Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
6
Star History
193 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.