Awesome-Generative-Models-for-OCR by NiceRingNode

Benchmark for generative OCR

Created 1 year ago

272 stars

Top 94.6% on SourcePulse

Project Summary

Summary

OCRGenBench addresses the need for a comprehensive evaluation of generative models' OCR capabilities. It provides a unified benchmark and framework for researchers and developers to assess visual text synthesis, covering text generation, editing, and image-to-image translation tasks, thereby enabling reproducible and multi-dimensional model evaluation.

How It Works

The project pioneers a unified approach by integrating Text-to-Image (T2I) generation, text editing, and OCR-related image-to-image translation tasks. It comprises 5 text categories and 33 OCR generative tasks, built upon 1,060 human-annotated samples featuring dense text, varied layouts, and bilingual content. A novel unified metric, OCRGenScore, assesses text accuracy, instruction following, visual quality, and structural consistency, facilitating reproducible evaluation across diverse models.

Quick Start & Requirements

The provided README focuses on the benchmark's scope and evaluation framework rather than specific installation or execution commands. Detailed prerequisites, dependencies, or setup instructions are not specified.

Highlighted Details

OCRGenBench is presented as the most comprehensive benchmark to date for evaluating OCR generative capabilities.
It encompasses 5 text categories and 33 OCR generative tasks, including T2I generation, text editing, and OCR image-to-image translation.
The benchmark includes 1,060 high-quality, manually annotated samples.
A unified metric, OCRGenScore, is introduced to assess text accuracy, instruction following, visual quality, and structural consistency.
Evaluations have included models such as GPT-4o and Qwen-Image.

Maintenance & Community

The project is associated with the Deep Learning and Vision Computing (DLVC) Lab at South China University of Technology, with copyright held for 2025-2026. Contact is available via eeprzhang@mail.scut.edu.cn. No specific community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The repository's README does not explicitly state the software license. This omission requires clarification regarding its terms of use, distribution, and compatibility, particularly for commercial applications.

Limitations & Caveats

The project serves as a benchmark and evaluation framework, not a deployable OCR system. Specific limitations regarding unsupported platforms, performance on certain edge cases, or alpha/beta status are not detailed in the provided README. The focus is on evaluation methodology rather than a production-ready tool.

Awesome-Generative-Models-for-OCR by NiceRingNode

Explore Similar Projects

LLaVAR by SALT-NLP

BetterOCR by junhoyeo

AWESOME-OCR-LLM by Yuliang-Liu

benchmark by getomni-ai

t2v_metrics by linzhiqiu

deepseek-ocr-client by ihatecsv

Monkey by Yuliang-Liu

AdvancedLiterateMachinery by AlibabaResearch

DeepSeek-OCR-2 by deepseek-ai

GOT-OCR2.0 by Ucas-HaoranWei

dots.ocr by rednote-hilab

DeepSeek-OCR by deepseek-ai