Discover and explore top open-source AI tools and projects—updated daily.
TimmyOVORust OCR inference stack with OpenAI compatibility
New!
Top 25.8% on SourcePulse
Summary
This repository provides a Rust implementation of the DeepSeek-OCR inference stack, offering a high-performance Command-Line Interface (CLI) and an OpenAI-compatible HTTP server. It targets users needing efficient, local document understanding pipelines without Python dependencies, delivering smaller deployable artifacts, memory safety, and unified tooling. The project enables running OCR locally on CPU, Apple Metal, or experimental NVIDIA CUDA GPUs.
How It Works
The project leverages the Candle ML framework for tensor computations, supporting CPU, Apple Metal, and experimental CUDA backends with FlashAttention. It features a vision preprocessing pipeline that builds a square canvas with letterboxing and dynamic tiling for high-resolution crops. Fused SAM and CLIP models extract features, which are then processed by a custom ImageProjector and injected with learned tokens. The text decoding stack is a Candle reimplementation of DeepSeek-V2, incorporating DynamicCache for efficient token streaming. Rewriting in Rust eliminates Python runtime overhead, enabling memory-safe, thread-friendly infrastructure and a unified CLI/server experience.
Quick Start & Requirements
git clone), navigate into it (cd), and fetch dependencies (cargo fetch). Model assets (~6.3GB) download automatically on first use.
cargo run -p deepseek-ocr-cli --release -- [args] or cargo install --path crates/cli.cargo run -p deepseek-ocr-server --release -- [args].Highlighted Details
/v1/responses and /v1/chat/completions endpoints.--features metal, --device metal). Experimental CUDA support (--features cuda, --device cuda) is available for NVIDIA GPUs.Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord/Slack), sponsorships, or a public roadmap beyond technical development items are present in the README.
Licensing & Compatibility
This project inherits the licenses of its dependencies and the upstream DeepSeek-OCR model. Users must refer to DeepSeek-OCR/LICENSE for model terms and apply the same restrictions to downstream use. Compatibility for commercial use or closed-source linking is subject to these inherited terms.
Limitations & Caveats
NVIDIA CUDA GPU support is currently in alpha quality and may exhibit rough edges. Some numerical alignment deltas between the Rust implementation and the PyTorch reference (mainly projector normalisation and vision tiling) are still being addressed and are tracked on the roadmap. The server automatically collapses multi-turn chat inputs to the latest user message, which may not suit all conversational use cases.
1 day ago
Inactive
ELS-RD
guillaume-be
huggingface