ocrs  by robertknight

Rust library and CLI tool for OCR

created 1 year ago
1,581 stars

Top 27.0% on sourcepulse

GitHubView on GitHub
Project Summary

Ocrs is a Rust library and CLI tool for Optical Character Recognition (OCR), aiming to provide a modern, ML-driven engine that requires less preprocessing than traditional OCR systems like Tesseract. It targets developers and users needing to extract text from various image types, with a focus on ease of compilation, cross-platform compatibility (including WebAssembly), and an understandable codebase.

How It Works

Ocrs leverages neural network models trained in PyTorch, which are then exported to ONNX format for execution via the RTen engine. This ML-centric approach is designed to improve accuracy across diverse image inputs with minimal pre-processing.

Quick Start & Requirements

  • Install CLI: cargo install ocrs-cli --locked
  • Requires Rust and Cargo.
  • Models are downloaded automatically to ~/.cache/ocrs on first run.
  • CLI Usage Examples
  • Library Usage

Highlighted Details

  • Supports Latin alphabet characters.
  • Exports models to ONNX for broader runtime compatibility.
  • Offers CLI options for outputting text, JSON layout information, and annotated PNGs.
  • Includes unit and end-to-end tests for pipeline validation.

Maintenance & Community

  • Project appears to be actively maintained by a single primary author.
  • No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The repository's LICENSE file should be consulted for definitive terms.

Limitations & Caveats

Ocrs is in early preview, meaning it may exhibit more errors than established commercial OCR engines. Currently, it only supports the Latin alphabet, with broader language support planned.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
1
Star History
107 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.