DeepSeek-OCR-WebUI  by neosun100

Intelligent OCR web application for diverse document and image analysis

Created 2 months ago
285 stars

Top 91.9% on SourcePulse

GitHubView on GitHub
Project Summary

DeepSeek-OCR-WebUI provides a ready-to-use, modern web interface for the DeepSeek-OCR model, enabling efficient image and document text recognition. It targets engineers and power users seeking robust OCR capabilities with advanced features like batch processing, multilingual support, and real-time logging, offering a flexible solution for local or private deployments.

How It Works

This project wraps the DeepSeek-OCR model within a responsive web application. It employs a multi-mode approach, offering seven distinct recognition types (e.g., Document to Markdown, General OCR, Find & Locate) to cater to diverse use cases. Key architectural choices include native Apple Silicon (MPS) acceleration for Mac users, an optional ModelScope fallback for improved accessibility in China, and seamless PDF-to-image conversion for batch processing. The UI features bounding box visualization for precise location identification.

Quick Start & Requirements

  • Installation: Docker (recommended), Mac Native (conda), or Linux Native.
  • Prerequisites:
    • Docker: Docker, Docker Compose, NVIDIA GPU/Drivers (for acceleration), 8GB+ RAM, 20GB+ disk.
    • Mac (Apple Silicon): macOS M1/M2/M3/M4, Python 3.11+, 16GB+ RAM (recommended), 20GB+ disk.
    • Linux (Native): Python 3.11+, NVIDIA GPU/CUDA (optional), 8GB+ RAM, 20GB+ disk.
  • Resource Footprint: Docker image ~20GB; native Mac setup downloads ~7GB model.
  • Documentation: Detailed guides available for Docker (DOCKER_HUB.md), API (API.md), and MCP setup (MCP_SETUP.md).

Highlighted Details

  • 7 Recognition Modes: Supports Document, OCR, Chart, Find, Freeform, Custom Prompt, and Image Description tasks.
  • Native Apple Silicon Support: Leverages MPS backend for accelerated OCR on Mac M1/M2/M3/M4.
  • PDF Processing: Automatically converts PDF files into images for OCR, supporting multi-page documents.
  • ModelScope Fallback: Intelligently switches to ModelScope when HuggingFace is unavailable, enhancing usability for users in mainland China.
  • Batch Processing & Visualization: Enables sequential recognition of multiple images and displays results with bounding box annotations.

Maintenance & Community

The project is actively maintained, with recent updates including Apple Silicon support and PDF processing. Community engagement is encouraged via contributions, issue tracking, and discussions. Specific links for community channels like Discord or Slack are not provided.

Licensing & Compatibility

Licensed under the MIT License, permitting broad use, modification, and distribution, including for commercial purposes and integration into closed-source projects.

Limitations & Caveats

Optimal performance and acceleration rely on specific hardware (NVIDIA GPUs or Apple Silicon). Native installations require Python 3.11+ and significant disk space for model downloads. While robust, the project does not explicitly detail known bugs or unsupported platforms beyond hardware recommendations.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
8
Star History
40 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.