Document extraction and parsing API using OCR and Ollama models
Top 17.6% on sourcepulse
This project provides an API for extracting text and structured data from various document formats (PDF, Word, PPTX) and images, with capabilities for PII anonymization. It targets developers and users needing to process documents offline, leveraging modern OCR and LLM technologies for high accuracy and data transformation into JSON or Markdown.
How It Works
The API utilizes FastAPI for its web framework and Celery with Redis for asynchronous task processing and caching. It supports multiple OCR strategies, including EasyOCR, MiniCPM-V, and Llama Vision, with an option to integrate with external OCR services like marker-pdf. LLMs (via Ollama) are employed to refine OCR output, correct errors, and extract structured data based on user prompts.
Quick Start & Requirements
make install
or manual setup with pip install -e .
.make run
or docker-compose up --build
. GPU support requires docker-compose.gpu.yml
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Docker on macOS does not currently support Apple GPUs, requiring native setup for GPU acceleration. The DISABLE_LOCAL_OLLAMA
environment variable is not yet functional within Docker environments.
1 day ago
1 day