DeepSeek-OCR-WebUI by neosun100

Intelligent OCR web application for diverse document and image analysis

Created 8 months ago

435 stars

Top 67.8% on SourcePulse

Project Summary

DeepSeek-OCR-WebUI provides a ready-to-use, modern web interface for the DeepSeek-OCR model, enabling efficient image and document text recognition. It targets engineers and power users seeking robust OCR capabilities with advanced features like batch processing, multilingual support, and real-time logging, offering a flexible solution for local or private deployments.

How It Works

This project wraps the DeepSeek-OCR model within a responsive web application. It employs a multi-mode approach, offering seven distinct recognition types (e.g., Document to Markdown, General OCR, Find & Locate) to cater to diverse use cases. Key architectural choices include native Apple Silicon (MPS) acceleration for Mac users, an optional ModelScope fallback for improved accessibility in China, and seamless PDF-to-image conversion for batch processing. The UI features bounding box visualization for precise location identification.

Quick Start & Requirements

Installation: Docker (recommended), Mac Native (conda), or Linux Native.
Prerequisites:
- Docker: Docker, Docker Compose, NVIDIA GPU/Drivers (for acceleration), 8GB+ RAM, 20GB+ disk.
- Mac (Apple Silicon): macOS M1/M2/M3/M4, Python 3.11+, 16GB+ RAM (recommended), 20GB+ disk.
- Linux (Native): Python 3.11+, NVIDIA GPU/CUDA (optional), 8GB+ RAM, 20GB+ disk.
Resource Footprint: Docker image ~20GB; native Mac setup downloads ~7GB model.
Documentation: Detailed guides available for Docker (DOCKER_HUB.md), API (API.md), and MCP setup (MCP_SETUP.md).

Highlighted Details

7 Recognition Modes: Supports Document, OCR, Chart, Find, Freeform, Custom Prompt, and Image Description tasks.
Native Apple Silicon Support: Leverages MPS backend for accelerated OCR on Mac M1/M2/M3/M4.
PDF Processing: Automatically converts PDF files into images for OCR, supporting multi-page documents.
ModelScope Fallback: Intelligently switches to ModelScope when HuggingFace is unavailable, enhancing usability for users in mainland China.
Batch Processing & Visualization: Enables sequential recognition of multiple images and displays results with bounding box annotations.

Maintenance & Community

The project is actively maintained, with recent updates including Apple Silicon support and PDF processing. Community engagement is encouraged via contributions, issue tracking, and discussions. Specific links for community channels like Discord or Slack are not provided.

Licensing & Compatibility

Licensed under the MIT License, permitting broad use, modification, and distribution, including for commercial purposes and integration into closed-source projects.

Limitations & Caveats

Optimal performance and acceleration rely on specific hardware (NVIDIA GPUs or Apple Silicon). Native installations require Python 3.11+ and significant disk space for model downloads. While robust, the project does not explicitly detail known bugs or unsupported platforms beyond hardware recommendations.

DeepSeek-OCR-WebUI by neosun100

Explore Similar Projects

ferrules by AmineDiro

Versatile-OCR-Program by raphael-seo

deepseek-ocr-client by ihatecsv

awesome-ocr by zacharywhitley

HunyuanOCR by Tencent-Hunyuan

OpenOCR by Topdu

local_ai_ocr by th1nhhdk

deepdoctection by deepdoctection

OnnxOCR by jingsongliujing

awesome-ocr by wanghaisheng

liteparse by run-llama

surya by datalab-to