api-llm-ocr  by yigitkonur

Open-source OCR API leveraging LLMs for document text extraction

Created 1 year ago
891 stars

Top 40.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides an open-source OCR API that leverages OpenAI's GPT-4 Turbo with Vision for high-quality text extraction from PDFs. It targets businesses and developers needing efficient document digitization, offering advanced features like parallel processing and batching for optimized performance and cost-effectiveness.

How It Works

The API accepts PDF files via upload or URL, converting pages to images concurrently using multiprocessing. These images are then processed in batches by GPT-4 Turbo with Vision for accurate text extraction. A retry mechanism with exponential backoff ensures resilience against API rate limits and transient failures. The extracted text is formatted in Markdown for readability.

Quick Start & Requirements

Highlighted Details

  • Utilizes GPT-4 Turbo with Vision for advanced OCR.
  • Offers parallel PDF conversion and batched image processing.
  • Includes a retry mechanism with exponential backoff.
  • Outputs extracted text in Markdown format.
  • Claims significant cost savings compared to alternatives like CloudConvert.

Maintenance & Community

No specific community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

Licensed under the GNU AGPL v3.0. This license is copyleft and may impose restrictions on linking with proprietary software.

Limitations & Caveats

The project's reliance on the GNU AGPL v3.0 license may restrict its use in closed-source commercial applications. The README notes that PyMuPDF requires this license change.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Tom Preston-Werner Tom Preston-Werner(Cofounder of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
21 more.

markitdown by microsoft

7.6%
100k
Python tool for converting files to Markdown for LLM text analysis
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.