AWESOME-OCR-LLM by Yuliang-Liu

Advancing OCR with Large Language Models

Created 5 months ago

666 stars

Top 49.8% on SourcePulse

Project Summary

This repository serves as a curated survey of Optical Character Recognition (OCR) research within the context of Large Language Models (LLMs) and Vision-Language Models (VLMs), focusing on advancements from 2021 to 2026. It targets researchers, engineers, and practitioners seeking to understand the evolving landscape of visual text parsing, understanding, and generation. The project offers a consolidated view of cutting-edge trends, benchmarks, and specialized models, aiding in rapid technical due diligence for adopting or contributing to this rapidly advancing field.

How It Works

The project functions as a structured bibliography, systematically cataloging and categorizing recent research papers, models, and benchmarks. It highlights key emerging trends such as the shift towards end-to-end VLM-based parsing, the increasing importance of document structure and logical understanding over raw accuracy, and the rise of OCR-free document understanding methods. The content is organized into thematic sections covering visual text parsing, understanding, evaluation, and specialized applications, providing a comprehensive overview of the state-of-the-art.

Quick Start & Requirements

This repository is a curated list of research and trends, not a deployable software project. Therefore, there are no installation or execution instructions provided.

Highlighted Details

Focuses on research from 2021-2026, with a significant emphasis on advancements from 2023 onwards.
Identifies critical emerging trends: end-to-end VLM parsing, reinforcement learning for layout, OCR-free understanding, compact VLMs, generative OCR risks (hallucination), and the development of document agents.
Covers a broad spectrum of OCR-related tasks, including visual text parsing, semantic understanding, specialized models (math, geometry, tables, charts), document dewarping, physical structure analysis, reading order prediction, scene text spotting, and visual text generation.
Lists numerous recent models and benchmarks, offering a snapshot of the state-of-the-art in document AI research.

Maintenance & Community

The project actively welcomes community contributions, pull requests, suggestions, feedback, and corrections, indicating an open approach to maintaining and updating its curated content. No specific community channels or contributor details are provided.

Licensing & Compatibility

No license information is provided within the README content.

Limitations & Caveats

The "Overview" section explicitly states that the full survey is "coming soon," indicating that the content is still under development and may be incomplete. This is a curated list of research findings and trends, not a functional tool or framework.

AWESOME-OCR-LLM by Yuliang-Liu

Explore Similar Projects

Awesome-Generative-Models-for-OCR by NiceRingNode

Qianfan-VL by baidubce

Vary-toy by Ucas-HaoranWei

DeepSeek-OCR-Web by fufankeji

HunyuanOCR by Tencent-Hunyuan

OpenOCR by Topdu

mPLUG-DocOwl by X-PLUG

deepdoctection by deepdoctection

AdvancedLiterateMachinery by AlibabaResearch

DeepSeek-OCR-2 by deepseek-ai

dots.ocr by rednote-hilab

surya by datalab-to