Awesome-Generative-Models-for-OCR  by NiceRingNode

Benchmark for generative OCR

Created 1 year ago
257 stars

Top 98.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

OCRGenBench addresses the need for a comprehensive evaluation of generative models' OCR capabilities. It provides a unified benchmark and framework for researchers and developers to assess visual text synthesis, covering text generation, editing, and image-to-image translation tasks, thereby enabling reproducible and multi-dimensional model evaluation.

How It Works

The project pioneers a unified approach by integrating Text-to-Image (T2I) generation, text editing, and OCR-related image-to-image translation tasks. It comprises 5 text categories and 33 OCR generative tasks, built upon 1,060 human-annotated samples featuring dense text, varied layouts, and bilingual content. A novel unified metric, OCRGenScore, assesses text accuracy, instruction following, visual quality, and structural consistency, facilitating reproducible evaluation across diverse models.

Quick Start & Requirements

The provided README focuses on the benchmark's scope and evaluation framework rather than specific installation or execution commands. Detailed prerequisites, dependencies, or setup instructions are not specified.

Highlighted Details

  • OCRGenBench is presented as the most comprehensive benchmark to date for evaluating OCR generative capabilities.
  • It encompasses 5 text categories and 33 OCR generative tasks, including T2I generation, text editing, and OCR image-to-image translation.
  • The benchmark includes 1,060 high-quality, manually annotated samples.
  • A unified metric, OCRGenScore, is introduced to assess text accuracy, instruction following, visual quality, and structural consistency.
  • Evaluations have included models such as GPT-4o and Qwen-Image.

Maintenance & Community

The project is associated with the Deep Learning and Vision Computing (DLVC) Lab at South China University of Technology, with copyright held for 2025-2026. Contact is available via eeprzhang@mail.scut.edu.cn. No specific community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The repository's README does not explicitly state the software license. This omission requires clarification regarding its terms of use, distribution, and compatibility, particularly for commercial applications.

Limitations & Caveats

The project serves as a benchmark and evaluation framework, not a deployable OCR system. Specific limitations regarding unsupported platforms, performance on certain edge cases, or alpha/beta status are not detailed in the provided README. The focus is on evaluation methodology rather than a production-ready tool.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
10 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.