ollama-ocr  by bytefer

OCR tool using local visual models via Ollama

created 8 months ago
287 stars

Top 92.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an Optical Character Recognition (OCR) tool that leverages local, Ollama-supported visual models like Llama 3.2-Vision or MiniCPM-V 2.6 to extract text from images. It is designed for developers and researchers who need accurate, privacy-preserving OCR capabilities without relying on cloud services, aiming to preserve original text formatting.

How It Works

The tool integrates with a locally running Ollama server, sending image files and user-defined prompts to specified visual models. It then processes the model's output to extract and return recognized text, with options for plain text or Markdown formatted output. This approach allows for customizable OCR tasks and ensures data privacy by keeping all processing local.

Quick Start & Requirements

  • Install via npm: npm install ollama-ocr or pnpm add ollama-ocr.
  • Requires Node.js 18.0+ and a running local Ollama server.
  • Requires a compatible visual model (e.g., Llama 3.2-Vision, minicpm-v) to be downloaded within Ollama.
  • Supports JPG, JPEG, and PNG image formats.
  • Official documentation and usage examples are available in the README.

Highlighted Details

  • Utilizes advanced visual models for high-accuracy text recognition.
  • Preserves original text formatting and structure in the output.
  • Offers customizable system prompts for tailored OCR tasks.
  • Includes robust error handling for common issues like file not found or server connection failures.

Maintenance & Community

The project is maintained by bytefer. Community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

  • Licensed under the MIT license.
  • Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool requires a local Ollama server and specific models to be pre-downloaded and running, which can be resource-intensive. Support is limited to JPG, JPEG, and PNG image formats.

Health Check
Last commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.