OCR package for extracting text from images/PDFs using vision language models via Ollama
Top 26.7% on sourcepulse
This package provides Optical Character Recognition (OCR) capabilities by leveraging state-of-the-art vision-language models through Ollama. It targets developers and users needing to extract text from images and PDFs, offering both a Python library and a Streamlit web application for flexible integration and use.
How It Works
The core approach utilizes Ollama to serve various vision-language models (LLaVA, Granite3.2-vision, Moondream, Minicpm-v). Users select a model and can process single images or batches, with options for custom prompts, output formats (Markdown, Plain Text, JSON, Structured, Key-Value, Table), and language specification. Image preprocessing is also supported.
Quick Start & Requirements
pip install ollama-ocr
ollama pull llama3.2-vision:11b
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README notes that the LLaVA model can sometimes generate incorrect output. Specific performance benchmarks or detailed error handling mechanisms are not provided.
4 months ago
1 week