llm-ocr  by yigitkonur

Open-source OCR API leveraging LLMs for document text extraction

created 10 months ago
864 stars

Top 42.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides an open-source OCR API that leverages OpenAI's GPT-4 Turbo with Vision for high-quality text extraction from PDFs. It targets businesses and developers needing efficient document digitization, offering advanced features like parallel processing and batching for optimized performance and cost-effectiveness.

How It Works

The API accepts PDF files via upload or URL, converting pages to images concurrently using multiprocessing. These images are then processed in batches by GPT-4 Turbo with Vision for accurate text extraction. A retry mechanism with exponential backoff ensures resilience against API rate limits and transient failures. The extracted text is formatted in Markdown for readability.

Quick Start & Requirements

Highlighted Details

  • Utilizes GPT-4 Turbo with Vision for advanced OCR.
  • Offers parallel PDF conversion and batched image processing.
  • Includes a retry mechanism with exponential backoff.
  • Outputs extracted text in Markdown format.
  • Claims significant cost savings compared to alternatives like CloudConvert.

Maintenance & Community

No specific community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

Licensed under the GNU AGPL v3.0. This license is copyleft and may impose restrictions on linking with proprietary software.

Limitations & Caveats

The project's reliance on the GNU AGPL v3.0 license may restrict its use in closed-source commercial applications. The README notes that PyMuPDF requires this license change.

Health Check
Last commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Dan Guido Dan Guido(Cofounder of Trail of Bits), and
8 more.

markitdown by microsoft

0.9%
70k
Python tool for converting files to Markdown for LLM text analysis
created 8 months ago
updated 2 months ago
Feedback? Help us improve.