DeepSeek-OCR-Web by fufankeji

Multimodal document parsing studio for PDFs and images

Created 8 months ago

566 stars

Top 56.1% on SourcePulse

Project Summary

This project provides an out-of-the-box web studio for DeepSeek-OCR, enabling multimodal document parsing for PDFs and images. It targets users needing efficient, high-precision OCR, layout analysis, and specialized extraction of tables, charts, and domain-specific drawings, converting complex documents into structured Markdown.

How It Works

Built with a React frontend and FastAPI backend, the studio leverages the DeepSeek-OCR model for its core intelligence. It employs a multimodal approach to process diverse document formats, performing intelligent OCR, detailed layout analysis, and specialized recognition for tables, charts, and professional drawings. The system aims to extract and structure information accurately, facilitating conversion to Markdown.

Quick Start & Requirements

Primary Install: One-click script (install.sh, start.sh) or manual installation.
Prerequisites: Linux OS, GPU ≥ 7 GB VRAM (16–24 GB recommended), Python 3.10–3.12 (3.10/3.11 recommended), CUDA 11.8 or 12.1/12.2 (driver match required), specific PyTorch version matching CUDA.
Compatibility Note: RTX 50 series GPUs are currently incompatible.
Links: Model weights available via Hugging Face or ModelScope. Project repository is implied.

Highlighted Details

Supports multi-format document parsing (PDF, images).
Features intelligent OCR recognition powered by DeepSeek-OCR.
Performs accurate layout analysis and content extraction.
Offers multi-language text recognition (e.g., Chinese, English).
Includes professional table and chart parsing capabilities.
Recognizes professional domain drawings (CAD, flowcharts).
Supports reverse parsing of data visualization charts.
Converts PDF content to structured Markdown format.

Maintenance & Community

Contributions are welcomed via GitHub Pull Requests and issues. Technical communication is facilitated through a dedicated assistant/group, accessible by replying "DeepSeekOCR".

Licensing & Compatibility

The project's license is not explicitly stated in the provided README. Compatibility for commercial use or linking with closed-source projects is not detailed.

Limitations & Caveats

The system is restricted to Linux operating systems and explicitly excludes RTX 50 series GPUs due to incompatibility. Specific Python and CUDA versions are mandatory, and their compatibility with the GPU driver is critical.

DeepSeek-OCR-Web by fufankeji

Explore Similar Projects

AWESOME-OCR-LLM by Yuliang-Liu

Versatile-OCR-Program by raphael-seo

pdfmd by M1ck4

Folio-OCR by vorojar

PolyglotPDF by CBIhalsen

nlm-ingestor by nlmatics

OnnxOCR by jingsongliujing

kordoc by chrisryugj

pdf-craft by oomol-lab

PyMuPDF by pymupdf

liteparse by run-llama

MinerU by opendatalab