Discover and explore top open-source AI tools and projects—updated daily.
datalab-toDocument intelligence toolkit for OCR, layout, and table recognition
Top 2.6% on SourcePulse
Summary
Surya is a 650M parameter Vision-Language Model (VLM) for comprehensive document intelligence, covering OCR, layout analysis, reading order, and table recognition in 90+ languages. It offers high accuracy and speed for researchers and power users, outperforming many larger models in its parameter class.
How It Works
Surya uses a unified VLM for layout analysis, OCR, and table recognition, complemented by a separate text-line detection model. This integrated approach enables efficient processing, outputting structured data like HTML. Deployment is flexible, leveraging external inference backends such as vLLM for GPUs or llama.cpp for CPUs/Apple Silicon.
Quick Start & Requirements
pip install surya-ocrvllm (NVIDIA GPU) or llama.cpp (CPU/Apple Silicon).llama-server binary from llama.cpp (e.g., brew install llama.cpp on macOS).llama.cpp releases: https://github.com/ggml-org/llama.cpp/releasessurya_gui after pip install streamlit pdftext.Highlighted Details
<math> tags.Maintenance & Community
Developed by Vikas Paruchuri and the Datalab Team. No specific community channels or roadmap links are provided. Contact hi@datalab.to for fine-tuning assistance.
Licensing & Compatibility
Code is Apache 2.0. Model weights use a modified AI Pubs Open Rail-M license (free for research, personal, and startups <$5M revenue). Broader commercial use requires a separate license via their pricing page.
Limitations & Caveats
Specialized for document intelligence; not optimized for natural scene photos. Core analysis tasks require a running inference backend (vLLM or llama.cpp). Performance can be sensitive to image resolution/quality, potentially needing preprocessing or threshold adjustments.
1 day ago
Inactive
rednote-hilab