Ollama-OCR  by imanoop7

OCR package for extracting text from images/PDFs using vision language models via Ollama

Created 9 months ago
2,025 stars

Top 22.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This package provides Optical Character Recognition (OCR) capabilities by leveraging state-of-the-art vision-language models through Ollama. It targets developers and users needing to extract text from images and PDFs, offering both a Python library and a Streamlit web application for flexible integration and use.

How It Works

The core approach utilizes Ollama to serve various vision-language models (LLaVA, Granite3.2-vision, Moondream, Minicpm-v). Users select a model and can process single images or batches, with options for custom prompts, output formats (Markdown, Plain Text, JSON, Structured, Key-Value, Table), and language specification. Image preprocessing is also supported.

Quick Start & Requirements

  • Installation: pip install ollama-ocr
  • Prerequisites: Ollama must be installed and running. Required models need to be pulled via Ollama (e.g., ollama pull llama3.2-vision:11b).
  • Resources: Requires Ollama and downloaded vision models.
  • Docs: Ollama OCR on Colab, Example Notebook

Highlighted Details

  • Supports PDF and image files.
  • Offers multiple output formats including structured data and tables.
  • Includes batch processing with parallel workers and progress tracking.
  • Provides a Streamlit web application with a drag-and-drop interface.
  • Allows custom prompts and language specification for enhanced accuracy.

Maintenance & Community

  • No specific contributors, sponsorships, or roadmap details are highlighted in the README.

Licensing & Compatibility

  • MIT License. Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The README notes that the LLaVA model can sometimes generate incorrect output. Specific performance benchmarks or detailed error handling mechanisms are not provided.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
54 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

DeepSeek-VL2 by deepseek-ai

0.1%
5k
MoE vision-language model for multimodal understanding
Created 9 months ago
Updated 6 months ago
Feedback? Help us improve.