chandra by datalab-to

High-accuracy OCR for complex documents

Created 3 months ago

4,289 stars

Top 11.4% on SourcePulse

Project Summary

Chandra is a highly accurate OCR model designed to convert images and PDFs into structured formats like HTML, Markdown, and JSON, with a strong emphasis on preserving complex layout information. It excels in handling challenging documents, including those with intricate tables, forms, handwriting, and mixed content, supporting over 40 languages. Chandra offers flexible deployment options with two inference modes: a local HuggingFace-based approach and an optimized remote inference server powered by vLLM, catering to diverse user needs from local experimentation to scalable production environments.

How It Works

The core of Chandra lies in its sophisticated document understanding pipeline, which meticulously reconstructs document structure and content. It provides two primary inference methods: a local mode utilizing HuggingFace Transformers for accessibility and integration, and a high-performance remote inference mode via a vLLM server, engineered for speed and efficiency in batch processing and production deployments. This architecture allows for adaptable usage, balancing ease of setup with robust performance characteristics.

Quick Start & Requirements

Installation is straightforward via pip: pip install chandra-ocr. For the HuggingFace inference method, installing flash attention is recommended. The project provides CLI tools for processing files (chandra), launching an interactive Streamlit demo (chandra_app), and setting up a vLLM inference server (chandra_vllm). The vLLM server can be run via Docker or manually configured.

Highlighted Details

Achieves high accuracy across diverse document types, including tables, forms, and handwriting, as demonstrated by its benchmark performance.
Supports over 40 languages, enhancing its global applicability.
Faithfully reconstructs complex layouts, forms (including checkboxes), tables, and mathematical content.
Offers two distinct inference modes: local HuggingFace and optimized vLLM server for flexible deployment.
Capable of extracting images, their captions, and associated structured data.

Maintenance & Community

A Discord server is available for discussions regarding future development and community engagement.

Licensing & Compatibility

The project's code is licensed under Apache 2.0. However, the model weights are distributed under a modified OpenRAIL-M license. This license permits free use for research, personal projects, and startups with under $2M in funding/revenue. Commercial use beyond these terms, or for entities wishing to avoid OpenRAIL restrictions, requires a separate commercial license, available via a linked pricing page.

Limitations & Caveats

The primary limitation for commercial adoption stems from the OpenRAIL-M license on model weights, which imposes restrictions on usage for larger companies or those directly competing with the provider's hosted API services. Users must carefully review these terms to ensure compliance.

chandra by datalab-to

Explore Similar Projects

ferrules by AmineDiro

rowfill by harishdeivanayagam

SmartResume by alibaba

Versatile-OCR-Program by ses4255

DeepSeek-OCR-Web by fufankeji

HunyuanOCR by Tencent-Hunyuan

mPLUG-DocOwl by X-PLUG

AdvancedLiterateMachinery by AlibabaResearch

Ollama-OCR by imanoop7

Pix2Text by breezedeus

PolyglotPDF by CBIhalsen

dots.ocr by rednote-hilab