Discover and explore top open-source AI tools and projects—updated daily.
Bogdanovich77PDF to Markdown converter with OCR and API
New!
Top 38.3% on SourcePulse
This project provides a Dockerized REST API and batch processing scripts for converting PDF documents to Markdown format using DeepSeek-OCR. It targets developers and users needing robust OCR and document structuring capabilities, offering a flexible solution with pre-applied fixes for critical library issues.
How It Works
The solution leverages the DeepSeek-OCR model for optical character recognition and document understanding, powered by a FastAPI backend. It offers both a REST API for real-time processing and Python scripts for batch conversion. A key differentiator is the inclusion of custom Python files that transparently replace core components of the original DeepSeek-OCR library during the Docker build. These patches address critical bugs, such as missing prompt parameters in model initialization, and enable enhanced configuration and prompt flexibility.
Quick Start & Requirements
docker-compose build to build the image and docker-compose up -d to start the service.huggingface-cli download deepseek-ai/DeepSeek-OCR --local-dir models/deepseek-ai/DeepSeek-OCR).Highlighted Details
custom_config.py) and patched run scripts (custom_run_dpsk_ocr_*.py) that fix critical initialization bugs and allow custom prompts via API or command line.custom_prompt.yaml or passed dynamically.Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmaps are provided in the README.
Licensing & Compatibility
The project states it follows the same license as the DeepSeek-OCR project. Specific license details and compatibility for commercial use or closed-source linking are not elaborated upon in this README and require consulting the original DeepSeek-OCR project's license.
Limitations & Caveats
This solution has significant hardware requirements, mandating an NVIDIA GPU with CUDA support and substantial VRAM/RAM, making it unsuitable for CPU-bound or low-resource environments. The reliance on custom patches to the underlying library may introduce maintenance overhead or compatibility issues if the upstream DeepSeek-OCR library undergoes major changes. The specific license terms are not detailed here.
1 week ago
Inactive