Discover and explore top open-source AI tools and projects—updated daily.
dwqsOCR package leveraging Ollama and vision models
Top 99.8% on SourcePulse
This package provides Optical Character Recognition (OCR) capabilities by leveraging state-of-the-art vision-language models (VLMs) accessed through Ollama. It targets developers and users needing to extract text from images, offering flexibility with multiple output formats and support for advanced multimodal models like LLaVA, Llama 3.2 Vision, and MiniCPM-V.
How It Works
The project integrates with Ollama, a platform for running large language models locally. Users can pull and run various VLMs, such as LLaVA, Llama 3.2 Vision, and MiniCPM-V, which are capable of understanding both visual and textual input. By feeding images to these models via Ollama, the package extracts text, enabling OCR functionality powered by advanced AI. This approach allows for potentially higher accuracy and richer context extraction compared to traditional OCR methods, especially for complex or visually rich documents.
Quick Start & Requirements
ollama pull llama3.2-vision:11b, ollama pull llava:13b, ollama pull minicpm-v:8b.git clone git@github.com:dwqs/ollama-ocr.gitcd ollama-ocryarn or npm iyarn dev or npm run devdebounce/ollama-ocr Docker image.yarn/npm), and the specified Ollama models.Highlighted Details
Maintenance & Community
No specific information regarding maintainers, community channels (like Discord/Slack), or roadmap is provided in the README.
Licensing & Compatibility
The project is released under the MIT License, permitting broad use, modification, and distribution, including for commercial purposes.
Limitations & Caveats
The LLaVA model, while powerful, is noted to sometimes generate incorrect output. The setup requires installing and configuring Ollama and downloading potentially large VLM models.
10 months ago
Inactive