DeekSeek-OCR---Dockerized-API  by Bogdanovich77

PDF to Markdown converter with OCR and API

Created 1 week ago

New!

963 stars

Top 38.3% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a Dockerized REST API and batch processing scripts for converting PDF documents to Markdown format using DeepSeek-OCR. It targets developers and users needing robust OCR and document structuring capabilities, offering a flexible solution with pre-applied fixes for critical library issues.

How It Works

The solution leverages the DeepSeek-OCR model for optical character recognition and document understanding, powered by a FastAPI backend. It offers both a REST API for real-time processing and Python scripts for batch conversion. A key differentiator is the inclusion of custom Python files that transparently replace core components of the original DeepSeek-OCR library during the Docker build. These patches address critical bugs, such as missing prompt parameters in model initialization, and enable enhanced configuration and prompt flexibility.

Quick Start & Requirements

  • Primary install/run command: Use docker-compose build to build the image and docker-compose up -d to start the service.
  • Non-default prerequisites: NVIDIA GPU with CUDA 11.8+ support, minimum 12GB VRAM (model uses ~9GB), 32GB+ system RAM (64GB recommended), 50GB+ storage, Python 3.8+, Docker 20.10+ with GPU support, Docker Compose 2.0+, NVIDIA Container Toolkit.
  • Setup: Requires downloading DeepSeek-OCR model weights (e.g., via huggingface-cli download deepseek-ai/DeepSeek-OCR --local-dir models/deepseek-ai/DeepSeek-OCR).
  • Links: Model download instructions provided.

Highlighted Details

  • Provides multiple processing scripts: basic Markdown conversion, enhanced Markdown with image extraction, plain OCR extraction, and custom prompt processing (both raw and enhanced).
  • Offers REST API endpoints for processing single images, PDFs, and batch uploads.
  • Includes client integration examples for Python and JavaScript.
  • Features custom configuration (custom_config.py) and patched run scripts (custom_run_dpsk_ocr_*.py) that fix critical initialization bugs and allow custom prompts via API or command line.
  • Supports custom prompts defined in custom_prompt.yaml or passed dynamically.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmaps are provided in the README.

Licensing & Compatibility

The project states it follows the same license as the DeepSeek-OCR project. Specific license details and compatibility for commercial use or closed-source linking are not elaborated upon in this README and require consulting the original DeepSeek-OCR project's license.

Limitations & Caveats

This solution has significant hardware requirements, mandating an NVIDIA GPU with CUDA support and substantial VRAM/RAM, making it unsuitable for CPU-bound or low-resource environments. The reliance on custom patches to the underlying library may introduce maintenance overhead or compatibility issues if the upstream DeepSeek-OCR library undergoes major changes. The specific license terms are not detailed here.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
8
Star History
970 stars in the last 13 days

Explore Similar Projects

Feedback? Help us improve.