ollama-ocr  by dwqs

OCR package leveraging Ollama and vision models

Created 10 months ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

This package provides Optical Character Recognition (OCR) capabilities by leveraging state-of-the-art vision-language models (VLMs) accessed through Ollama. It targets developers and users needing to extract text from images, offering flexibility with multiple output formats and support for advanced multimodal models like LLaVA, Llama 3.2 Vision, and MiniCPM-V.

How It Works

The project integrates with Ollama, a platform for running large language models locally. Users can pull and run various VLMs, such as LLaVA, Llama 3.2 Vision, and MiniCPM-V, which are capable of understanding both visual and textual input. By feeding images to these models via Ollama, the package extracts text, enabling OCR functionality powered by advanced AI. This approach allows for potentially higher accuracy and richer context extraction compared to traditional OCR methods, especially for complex or visually rich documents.

Quick Start & Requirements

  • Installation:
    1. Install Ollama.
    2. Pull required models: ollama pull llama3.2-vision:11b, ollama pull llava:13b, ollama pull minicpm-v:8b.
    3. Clone the repository: git clone git@github.com:dwqs/ollama-ocr.git
    4. Navigate to the directory: cd ollama-ocr
    5. Install dependencies: yarn or npm i
    6. Run the development server: yarn dev or npm run dev
  • Docker: A demo can be run using the debounce/ollama-ocr Docker image.
  • Prerequisites: Ollama, Node.js (for yarn/npm), and the specified Ollama models.

Highlighted Details

  • Supports multiple output formats: Markdown, plain Text, and JSON.
  • Utilizes advanced VLMs: LLaVA, Llama 3.2 Vision, and MiniCPM-V 2.6.
  • Designed for local execution via Ollama.

Maintenance & Community

No specific information regarding maintainers, community channels (like Discord/Slack), or roadmap is provided in the README.

Licensing & Compatibility

The project is released under the MIT License, permitting broad use, modification, and distribution, including for commercial purposes.

Limitations & Caveats

The LLaVA model, while powerful, is noted to sometimes generate incorrect output. The setup requires installing and configuring Ollama and downloading potentially large VLM models.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.