llm-based-ocr by yigitkonur

Open-source OCR API leveraging LLMs for document text extraction

Created 1 year ago

876 stars

Top 41.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Dan Guido

Cofounder of Trail of Bits

Project Summary

This project provides an open-source OCR API that leverages OpenAI's GPT-4 Turbo with Vision for high-quality text extraction from PDFs. It targets businesses and developers needing efficient document digitization, offering advanced features like parallel processing and batching for optimized performance and cost-effectiveness.

How It Works

The API accepts PDF files via upload or URL, converting pages to images concurrently using multiprocessing. These images are then processed in batches by GPT-4 Turbo with Vision for accurate text extraction. A retry mechanism with exponential backoff ensures resilience against API rate limits and transient failures. The extracted text is formatted in Markdown for readability.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python 3.8+, Git, Virtualenv (recommended). Requires OpenAI API key and optionally Azure OpenAI credentials.
Setup: Clone repo, create virtual environment, install dependencies, configure .env with API keys.
Run: uvicorn main:app --reload
Demo: https://github.com/user-attachments/assets/6b39f3ea-248e-4c29-ac2e-b57de64d5d65

Highlighted Details

Utilizes GPT-4 Turbo with Vision for advanced OCR.
Offers parallel PDF conversion and batched image processing.
Includes a retry mechanism with exponential backoff.
Outputs extracted text in Markdown format.
Claims significant cost savings compared to alternatives like CloudConvert.

Maintenance & Community

No specific community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

Licensed under the GNU AGPL v3.0. This license is copyleft and may impose restrictions on linking with proprietary software.

Limitations & Caveats

The project's reliance on the GNU AGPL v3.0 license may restrict its use in closed-source commercial applications. The README notes that PyMuPDF requires this license change.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days