Open-source OCR API leveraging LLMs for document text extraction
Top 42.4% on sourcepulse
This project provides an open-source OCR API that leverages OpenAI's GPT-4 Turbo with Vision for high-quality text extraction from PDFs. It targets businesses and developers needing efficient document digitization, offering advanced features like parallel processing and batching for optimized performance and cost-effectiveness.
How It Works
The API accepts PDF files via upload or URL, converting pages to images concurrently using multiprocessing. These images are then processed in batches by GPT-4 Turbo with Vision for accurate text extraction. A retry mechanism with exponential backoff ensures resilience against API rate limits and transient failures. The extracted text is formatted in Markdown for readability.
Quick Start & Requirements
pip install -r requirements.txt
.env
with API keys.uvicorn main:app --reload
Highlighted Details
Maintenance & Community
No specific community links (Discord/Slack) or roadmap are provided in the README.
Licensing & Compatibility
Licensed under the GNU AGPL v3.0. This license is copyleft and may impose restrictions on linking with proprietary software.
Limitations & Caveats
The project's reliance on the GNU AGPL v3.0 license may restrict its use in closed-source commercial applications. The README notes that PyMuPDF requires this license change.
10 months ago
1 day