DeekSeek-OCR---Dockerized-API by Bogdanovich77

PDF to Markdown converter with OCR and API

Created 4 months ago

1,092 stars

Top 34.7% on SourcePulse

Project Summary

This project provides a Dockerized REST API and batch processing scripts for converting PDF documents to Markdown format using DeepSeek-OCR. It targets developers and users needing robust OCR and document structuring capabilities, offering a flexible solution with pre-applied fixes for critical library issues.

How It Works

The solution leverages the DeepSeek-OCR model for optical character recognition and document understanding, powered by a FastAPI backend. It offers both a REST API for real-time processing and Python scripts for batch conversion. A key differentiator is the inclusion of custom Python files that transparently replace core components of the original DeepSeek-OCR library during the Docker build. These patches address critical bugs, such as missing prompt parameters in model initialization, and enable enhanced configuration and prompt flexibility.

Quick Start & Requirements

Primary install/run command: Use docker-compose build to build the image and docker-compose up -d to start the service.
Non-default prerequisites: NVIDIA GPU with CUDA 11.8+ support, minimum 12GB VRAM (model uses ~9GB), 32GB+ system RAM (64GB recommended), 50GB+ storage, Python 3.8+, Docker 20.10+ with GPU support, Docker Compose 2.0+, NVIDIA Container Toolkit.
Setup: Requires downloading DeepSeek-OCR model weights (e.g., via huggingface-cli download deepseek-ai/DeepSeek-OCR --local-dir models/deepseek-ai/DeepSeek-OCR).
Links: Model download instructions provided.

Highlighted Details

Provides multiple processing scripts: basic Markdown conversion, enhanced Markdown with image extraction, plain OCR extraction, and custom prompt processing (both raw and enhanced).
Offers REST API endpoints for processing single images, PDFs, and batch uploads.
Includes client integration examples for Python and JavaScript.
Features custom configuration (custom_config.py) and patched run scripts (custom_run_dpsk_ocr_*.py) that fix critical initialization bugs and allow custom prompts via API or command line.
Supports custom prompts defined in custom_prompt.yaml or passed dynamically.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmaps are provided in the README.

Licensing & Compatibility

The project states it follows the same license as the DeepSeek-OCR project. Specific license details and compatibility for commercial use or closed-source linking are not elaborated upon in this README and require consulting the original DeepSeek-OCR project's license.

Limitations & Caveats

This solution has significant hardware requirements, mandating an NVIDIA GPU with CUDA support and substantial VRAM/RAM, making it unsuitable for CPU-bound or low-resource environments. The reliance on custom patches to the underlying library may introduce maintenance overhead or compatibility issues if the upstream DeepSeek-OCR library undergoes major changes. The specific license terms are not detailed here.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days