Discover and explore top open-source AI tools and projects—updated daily.
Tencent-HunyuanAdvanced OCR and document understanding via lightweight VLM
New!
Top 45.3% on SourcePulse
HunyuanOCR is an end-to-end OCR expert VLM built on a lightweight 1B parameter multimodal architecture. It achieves state-of-the-art performance across complex multilingual document parsing, text spotting, information extraction, video subtitle extraction, and photo translation, offering significant deployment cost reductions and enhanced usability compared to cascaded solutions.
How It Works
Leveraging Hunyuan's native multimodal architecture and training strategy, HunyuanOCR achieves SOTA performance with a remarkably efficient 1B parameter design. This end-to-end approach integrates text detection, recognition, complex document parsing, information extraction, and translation into a single model, simplifying inference and reducing deployment costs. Its design prioritizes ultimate usability through single-instruction, single-inference operations.
Quick Start & Requirements
vllm (pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly or uv pip install vllm --extra-index-url https://wheels.vllm.ai/nightly). Transformers installation is also available (pip install git+https://github.com/huggingface/transformers@82a06db03535c49aa987719ed0746a76093b1ec4).run_hy_ocr.py is available in Hunyuan-OCR-master/Hunyuan-OCR-hf.Highlighted Details
Maintenance & Community
The project acknowledges contributions and ideas from PaddleOCR, MinerU, MonkeyOCR, DeepSeek-OCR, dots.ocr, and benchmarks like OminiDocBench, OCRBench, DoTA. Support from vLLM and Hugging Face Communities for inference is also noted. No explicit community links or roadmap details are provided.
Licensing & Compatibility
No explicit license information is provided in the README.
Limitations & Caveats
The Transformers inference method currently exhibits performance degradation compared to the vLLM framework, though this is being addressed. The requirement for CUDA 12.9 is specific and may pose an adoption barrier.
3 days ago
Inactive
rednote-hilab