OCR tool using local visual models via Ollama
Top 92.3% on sourcepulse
This project provides an Optical Character Recognition (OCR) tool that leverages local, Ollama-supported visual models like Llama 3.2-Vision or MiniCPM-V 2.6 to extract text from images. It is designed for developers and researchers who need accurate, privacy-preserving OCR capabilities without relying on cloud services, aiming to preserve original text formatting.
How It Works
The tool integrates with a locally running Ollama server, sending image files and user-defined prompts to specified visual models. It then processes the model's output to extract and return recognized text, with options for plain text or Markdown formatted output. This approach allows for customizable OCR tasks and ensures data privacy by keeping all processing local.
Quick Start & Requirements
npm install ollama-ocr
or pnpm add ollama-ocr
.Highlighted Details
Maintenance & Community
The project is maintained by bytefer. Community interaction channels are not explicitly mentioned in the README.
Licensing & Compatibility
Limitations & Caveats
The tool requires a local Ollama server and specific models to be pre-downloaded and running, which can be resource-intensive. Support is limited to JPG, JPEG, and PNG image formats.
8 months ago
1 day