ollama-ocr by bytefer

OCR tool using local visual models via Ollama

Created 1 year ago

297 stars

Top 89.4% on SourcePulse

Project Summary

This project provides an Optical Character Recognition (OCR) tool that leverages local, Ollama-supported visual models like Llama 3.2-Vision or MiniCPM-V 2.6 to extract text from images. It is designed for developers and researchers who need accurate, privacy-preserving OCR capabilities without relying on cloud services, aiming to preserve original text formatting.

How It Works

The tool integrates with a locally running Ollama server, sending image files and user-defined prompts to specified visual models. It then processes the model's output to extract and return recognized text, with options for plain text or Markdown formatted output. This approach allows for customizable OCR tasks and ensures data privacy by keeping all processing local.

Quick Start & Requirements

Install via npm: npm install ollama-ocr or pnpm add ollama-ocr.
Requires Node.js 18.0+ and a running local Ollama server.
Requires a compatible visual model (e.g., Llama 3.2-Vision, minicpm-v) to be downloaded within Ollama.
Supports JPG, JPEG, and PNG image formats.
Official documentation and usage examples are available in the README.

Highlighted Details

Utilizes advanced visual models for high-accuracy text recognition.
Preserves original text formatting and structure in the output.
Offers customizable system prompts for tailored OCR tasks.
Includes robust error handling for common issues like file not found or server connection failures.

Maintenance & Community

The project is maintained by bytefer. Community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

Licensed under the MIT license.
Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool requires a local Ollama server and specific models to be pre-downloaded and running, which can be resource-intensive. Support is limited to JPG, JPEG, and PNG image formats.

ollama-ocr by bytefer

Explore Similar Projects

YomiNinja by matt-m-o

ollama-ocr by dwqs

BetterOCR by junhoyeo

llm-based-ocr by yigitkonur

llm_aided_ocr by Dicklesworthstone

Ollama-OCR by imanoop7

llama-ocr by Nutlope

comic-translate by ogkalu2

STranslate by STranslate

Bob by ripperhe

GOT-OCR2.0 by Ucas-HaoranWei

markitdown by microsoft