DeepSeek-OCR-Web  by fufankeji

Multimodal document parsing studio for PDFs and images

Created 1 week ago

New!

382 stars

Top 74.4% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an out-of-the-box web studio for DeepSeek-OCR, enabling multimodal document parsing for PDFs and images. It targets users needing efficient, high-precision OCR, layout analysis, and specialized extraction of tables, charts, and domain-specific drawings, converting complex documents into structured Markdown.

How It Works

Built with a React frontend and FastAPI backend, the studio leverages the DeepSeek-OCR model for its core intelligence. It employs a multimodal approach to process diverse document formats, performing intelligent OCR, detailed layout analysis, and specialized recognition for tables, charts, and professional drawings. The system aims to extract and structure information accurately, facilitating conversion to Markdown.

Quick Start & Requirements

  • Primary Install: One-click script (install.sh, start.sh) or manual installation.
  • Prerequisites: Linux OS, GPU ≥ 7 GB VRAM (16–24 GB recommended), Python 3.10–3.12 (3.10/3.11 recommended), CUDA 11.8 or 12.1/12.2 (driver match required), specific PyTorch version matching CUDA.
  • Compatibility Note: RTX 50 series GPUs are currently incompatible.
  • Links: Model weights available via Hugging Face or ModelScope. Project repository is implied.

Highlighted Details

  • Supports multi-format document parsing (PDF, images).
  • Features intelligent OCR recognition powered by DeepSeek-OCR.
  • Performs accurate layout analysis and content extraction.
  • Offers multi-language text recognition (e.g., Chinese, English).
  • Includes professional table and chart parsing capabilities.
  • Recognizes professional domain drawings (CAD, flowcharts).
  • Supports reverse parsing of data visualization charts.
  • Converts PDF content to structured Markdown format.

Maintenance & Community

Contributions are welcomed via GitHub Pull Requests and issues. Technical communication is facilitated through a dedicated assistant/group, accessible by replying "DeepSeekOCR".

Licensing & Compatibility

The project's license is not explicitly stated in the provided README. Compatibility for commercial use or linking with closed-source projects is not detailed.

Limitations & Caveats

The system is restricted to Linux operating systems and explicitly excludes RTX 50 series GPUs due to incompatibility. Specific Python and CUDA versions are mandatory, and their compatibility with the GPU driver is critical.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
9
Star History
388 stars in the last 12 days

Explore Similar Projects

Starred by Travis Fischer Travis Fischer(Founder of Agentic), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

MinerU by opendatalab

0.9%
48k
PDF extraction tool for converting PDFs to Markdown and JSON
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.