ollama-ocr  by bytefer

OCR tool using local visual models via Ollama

Created 9 months ago
293 stars

Top 90.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an Optical Character Recognition (OCR) tool that leverages local, Ollama-supported visual models like Llama 3.2-Vision or MiniCPM-V 2.6 to extract text from images. It is designed for developers and researchers who need accurate, privacy-preserving OCR capabilities without relying on cloud services, aiming to preserve original text formatting.

How It Works

The tool integrates with a locally running Ollama server, sending image files and user-defined prompts to specified visual models. It then processes the model's output to extract and return recognized text, with options for plain text or Markdown formatted output. This approach allows for customizable OCR tasks and ensures data privacy by keeping all processing local.

Quick Start & Requirements

  • Install via npm: npm install ollama-ocr or pnpm add ollama-ocr.
  • Requires Node.js 18.0+ and a running local Ollama server.
  • Requires a compatible visual model (e.g., Llama 3.2-Vision, minicpm-v) to be downloaded within Ollama.
  • Supports JPG, JPEG, and PNG image formats.
  • Official documentation and usage examples are available in the README.

Highlighted Details

  • Utilizes advanced visual models for high-accuracy text recognition.
  • Preserves original text formatting and structure in the output.
  • Offers customizable system prompts for tailored OCR tasks.
  • Includes robust error handling for common issues like file not found or server connection failures.

Maintenance & Community

The project is maintained by bytefer. Community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

  • Licensed under the MIT license.
  • Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool requires a local Ollama server and specific models to be pre-downloaded and running, which can be resource-intensive. Support is limited to JPG, JPEG, and PNG image formats.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
20 more.

markitdown by microsoft

6.7%
77k
Python tool for converting files to Markdown for LLM text analysis
Created 10 months ago
Updated 1 week ago
Feedback? Help us improve.