Ollama-OCR by imanoop7

OCR package for extracting text from images/PDFs using vision language models via Ollama

Created 1 year ago

2,085 stars

Top 21.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Michael Chiang

Cofounder of Ollama

Project Summary

This package provides Optical Character Recognition (OCR) capabilities by leveraging state-of-the-art vision-language models through Ollama. It targets developers and users needing to extract text from images and PDFs, offering both a Python library and a Streamlit web application for flexible integration and use.

How It Works

The core approach utilizes Ollama to serve various vision-language models (LLaVA, Granite3.2-vision, Moondream, Minicpm-v). Users select a model and can process single images or batches, with options for custom prompts, output formats (Markdown, Plain Text, JSON, Structured, Key-Value, Table), and language specification. Image preprocessing is also supported.

Quick Start & Requirements

Installation: pip install ollama-ocr
Prerequisites: Ollama must be installed and running. Required models need to be pulled via Ollama (e.g., ollama pull llama3.2-vision:11b).
Resources: Requires Ollama and downloaded vision models.
Docs: Ollama OCR on Colab, Example Notebook

Highlighted Details

Supports PDF and image files.
Offers multiple output formats including structured data and tables.
Includes batch processing with parallel workers and progress tracking.
Provides a Streamlit web application with a drag-and-drop interface.
Allows custom prompts and language specification for enhanced accuracy.

Maintenance & Community

No specific contributors, sponsorships, or roadmap details are highlighted in the README.

Licensing & Compatibility

MIT License. Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The README notes that the LLaVA model can sometimes generate incorrect output. Specific performance benchmarks or detailed error handling mechanisms are not provided.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

15 stars in the last 30 days