Discover and explore top open-source AI tools and projects—updated daily.
rednote-hilabParse anything from documents with multimodal OCR
Top 96.9% on SourcePulse
Multimodal OCR: Parse Anything from Documents (dots.mocr) is a comprehensive document parsing system designed to recognize diverse human scripts and structured graphical content. It addresses the challenge of extracting information from complex documents by integrating grounding, recognition, semantic understanding, and dialogue capabilities. The project offers state-of-the-art performance and novel SVG conversion for visual elements, benefiting researchers and power users needing advanced document analysis.
How It Works
The core approach employs a multimodal vision-language model (VLM) for unified document understanding. It excels at converting structured graphics, such as charts, UI layouts, and scientific figures, directly into Scalable Vector Graphics (SVG) code. This direct SVG generation is a key differentiator, enabling precise representation of visual data, complemented by a specialized dots.mocr-svg variant for enhanced image-to-SVG parsing.
Quick Start & Requirements
Highlighted Details
dots.mocr achieves 0.031 TextEdit and 0.029 Read OrderEdit on OmniDocBench v1.5.dots.mocr-svg variant achieves 0.901 ISVGEN for image-to-SVG parsing.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 months ago
Inactive
rednote-hilab
deepseek-ai
deepseek-ai