ollama-ocr by dwqs

OCR package leveraging Ollama and vision models

Created 1 year ago

251 stars

Top 99.8% on SourcePulse

Project Summary

This package provides Optical Character Recognition (OCR) capabilities by leveraging state-of-the-art vision-language models (VLMs) accessed through Ollama. It targets developers and users needing to extract text from images, offering flexibility with multiple output formats and support for advanced multimodal models like LLaVA, Llama 3.2 Vision, and MiniCPM-V.

How It Works

The project integrates with Ollama, a platform for running large language models locally. Users can pull and run various VLMs, such as LLaVA, Llama 3.2 Vision, and MiniCPM-V, which are capable of understanding both visual and textual input. By feeding images to these models via Ollama, the package extracts text, enabling OCR functionality powered by advanced AI. This approach allows for potentially higher accuracy and richer context extraction compared to traditional OCR methods, especially for complex or visually rich documents.

Quick Start & Requirements

Installation:
1. Install Ollama.
2. Pull required models: ollama pull llama3.2-vision:11b, ollama pull llava:13b, ollama pull minicpm-v:8b.
3. Clone the repository: git clone git@github.com:dwqs/ollama-ocr.git
4. Navigate to the directory: cd ollama-ocr
5. Install dependencies: yarn or npm i
6. Run the development server: yarn dev or npm run dev
Docker: A demo can be run using the debounce/ollama-ocr Docker image.
Prerequisites: Ollama, Node.js (for yarn/npm), and the specified Ollama models.

Highlighted Details

Supports multiple output formats: Markdown, plain Text, and JSON.
Utilizes advanced VLMs: LLaVA, Llama 3.2 Vision, and MiniCPM-V 2.6.
Designed for local execution via Ollama.

Maintenance & Community

No specific information regarding maintainers, community channels (like Discord/Slack), or roadmap is provided in the README.

Licensing & Compatibility

The project is released under the MIT License, permitting broad use, modification, and distribution, including for commercial purposes.

Limitations & Caveats

The LLaVA model, while powerful, is noted to sometimes generate incorrect output. The setup requires installing and configuring Ollama and downloading potentially large VLM models.

ollama-ocr by dwqs

Explore Similar Projects

YomiNinja by matt-m-o

ollama-ocr by bytefer

BetterOCR by junhoyeo

Versatile-OCR-Program by ses4255

benchmark by getomni-ai

llama-scan by ngafar

llm-based-ocr by yigitkonur

deepseek-ocr-client by ihatecsv

Ollama-OCR by imanoop7

llama-ocr by Nutlope

comic-translate by ogkalu2

GOT-OCR2.0 by Ucas-HaoranWei