deepseek-ocr.rs by TimmyOVO

Rust OCR inference stack with OpenAI compatibility

Created 2 months ago

2,090 stars

Top 21.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeffrey Morgan

Cofounder of Ollama

Project Summary

Summary

This repository provides a Rust implementation of the DeepSeek-OCR inference stack, offering a high-performance Command-Line Interface (CLI) and an OpenAI-compatible HTTP server. It targets users needing efficient, local document understanding pipelines without Python dependencies, delivering smaller deployable artifacts, memory safety, and unified tooling. The project enables running OCR locally on CPU, Apple Metal, or experimental NVIDIA CUDA GPUs.

How It Works

The project leverages the Candle ML framework for tensor computations, supporting CPU, Apple Metal, and experimental CUDA backends with FlashAttention. It features a vision preprocessing pipeline that builds a square canvas with letterboxing and dynamic tiling for high-resolution crops. Fused SAM and CLIP models extract features, which are then processed by a custom ImageProjector and injected with learned tokens. The text decoding stack is a Candle reimplementation of DeepSeek-V2, incorporating DynamicCache for efficient token streaming. Rewriting in Rust eliminates Python runtime overhead, enabling memory-safe, thread-friendly infrastructure and a unified CLI/server experience.

Quick Start & Requirements

Primary Install/Run: Clone the repository (git clone), navigate into it (cd), and fetch dependencies (cargo fetch). Model assets (~6.3GB) download automatically on first use.
- CLI: cargo run -p deepseek-ocr-cli --release -- [args] or cargo install --path crates/cli.
- Server: cargo run -p deepseek-ocr-server --release -- [args].
Prerequisites: Rust 1.78+ and Git. Optional: macOS 13+ for Metal acceleration, CUDA 12.2+ toolkit for NVIDIA GPU acceleration (Linux/Windows).
Resource Footprint: Requires approximately 13GB of RAM headroom during inference.
Links: No external quick-start or demo links provided beyond the repository itself.

Highlighted Details

Dual Entrypoints: Offers both a batteries-included CLI for batch processing and a Rocket-based HTTP server exposing OpenAI-compatible /v1/responses and /v1/chat/completions endpoints.
Automatic Asset Management: Downloads model weights, configurations, and tokenizers automatically from Hugging Face or ModelScope based on latency.
Hardware Acceleration: Optimized for Apple Silicon with an optional Metal backend (--features metal, --device metal). Experimental CUDA support (--features cuda, --device cuda) is available for NVIDIA GPUs.
OpenAI Compatibility: Acts as a drop-in replacement for OpenAI SDKs, automatically compacting chat history for OCR-friendly prompts.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord/Slack), sponsorships, or a public roadmap beyond technical development items are present in the README.

Licensing & Compatibility

This project inherits the licenses of its dependencies and the upstream DeepSeek-OCR model. Users must refer to DeepSeek-OCR/LICENSE for model terms and apply the same restrictions to downstream use. Compatibility for commercial use or closed-source linking is subject to these inherited terms.

Limitations & Caveats

NVIDIA CUDA GPU support is currently in alpha quality and may exhibit rough edges. Some numerical alignment deltas between the Rust implementation and the PyTorch reference (mainly projector normalisation and vision tiling) are still being addressed and are tracked on the roadmap. The server automatically collapses multi-turn chat inputs to the latest user message, which may not suit all conversational use cases.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

64 stars in the last 30 days