macos-vision-ocr by bytefer

Powerful macOS command-line OCR tool

Created 1 year ago

288 stars

Top 91.0% on SourcePulse

Project Summary

A powerful command-line OCR tool for macOS, macos-vision-ocr leverages Apple's native Vision framework to provide efficient, on-device text recognition. It targets developers and power users needing to integrate OCR capabilities into macOS workflows or applications, offering detailed positional data and supporting batch processing for high-throughput tasks.

How It Works

This tool utilizes Apple's Vision framework, a high-performance framework for image analysis integrated into macOS. By processing images directly on the device, it avoids external dependencies and potential latency associated with cloud-based OCR services. The approach allows for precise text detection, providing bounding box coordinates and confidence scores for each recognized text element, outputted in a structured JSON format.

Quick Start & Requirements

Primary install / run command: Build from source using Swift.
- For Apple Silicon (arm64): swift build -c release --arch arm64
- For Intel (x86_64): swift build -c release --arch x86_64
- Usage examples: ./macos-vision-ocr --img <path> or ./macos-vision-ocr --img-dir <path> --output-dir <path>
Non-default prerequisites and dependencies: macOS 10.15 or later (13+ recommended), Xcode, Command Line Tools.
Estimated setup time or resource footprint: Building from source requires compilation time. No specific resource footprint is detailed.
Links: GitHub Repository

Highlighted Details

Supports multiple image formats: JPG, JPEG, PNG, WEBP.
Offers both single image and batch processing modes.
Features multi-language recognition, supporting 16 languages including English, Chinese, Japanese, Korean, and various European languages.
Outputs detailed JSON with text positions, confidence scores, and image metadata.
Includes a debug mode to visualize detected text bounding boxes directly on images.

Maintenance & Community

The provided README does not detail specific contributors, sponsorships, or community channels (e.g., Discord, Slack).

Licensing & Compatibility

License type: MIT License.
Compatibility notes: The MIT license generally permits commercial use and integration into closed-source applications without significant restrictions.

Limitations & Caveats

This tool is exclusively for macOS. Installation requires building from source, and common issues may arise from incorrect image paths, unsupported image formats, insufficient file permissions, or images with unclear text, very small text (less than 1% of image height), or unsupported languages.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days