myocr by robbyzhaox

OCR framework for building custom pipelines

Created 11 months ago

286 stars

Top 91.8% on SourcePulse

Project Summary

MyOCR is an advanced OCR pipeline builder designed for engineers and researchers to create and integrate custom OCR systems. It offers a modular and extensible framework for end-to-end OCR development, enabling flexible training, integration of deep learning models, and production-ready deployment.

How It Works

MyOCR provides a unified pipeline for detection and recognition, allowing users to mix and match components like models and processors. It leverages ONNX runtime for efficient CPU/GPU inference and supports structured OCR output through integration with large language models like Qwen for data extraction.

Quick Start & Requirements

Install: pip install -e . (after cloning the repo)
Requirements: Python 3.11+, CUDA 12.6+ recommended for GPU.
Setup: Clone repo, install dependencies, download pre-trained weights.
Docs: https://robbyzhaox.github.io/myocr/
Demo: https://huggingface.co/spaces/robbyzhaox/myocr

Highlighted Details

End-to-end OCR development framework with modular components.
Developer-friendly Python APIs and prebuilt pipelines.
ONNX runtime support for fast CPU/GPU inference.
Structured OCR output via LLM integration (Ollama, OpenAI).

Maintenance & Community

Active development with recent releases (v0.1.1 on May 17, 2025).
Contribution guidelines provided.

Licensing & Compatibility

Licensed under Apache 2.0.
Permissive license suitable for commercial use and closed-source integration.

Limitations & Caveats

The structured output pipeline requires configuration for LLM APIs (Ollama, OpenAI) and specific model setups. The README mentions a UI (doc-insight-ui) but does not provide a direct link.

myocr by robbyzhaox

Explore Similar Projects

BetterOCR by junhoyeo

SmartResume by alibaba

Umi-OCR_plugins by hiroi-sora

ComfyUI-IF_AI_tools by if-ai

desktop-waifu by AlizerUncaged

mindocr by mindspore-lab

Leaderboard by SpeechColab

Ollama-OCR by imanoop7

omniparse by adithya-s-k

RapidOCR by RapidAI

GOT-OCR2.0 by Ucas-HaoranWei

PaddleSpeech by PaddlePaddle