docuglean-ocr by cernis-intelligence

Intelligent document processing SDK for AI-powered data extraction

Created 5 months ago

307 stars

Top 87.7% on SourcePulse

Project Summary

Intelligent document processing is addressed by Docuglean, a unified SDK designed to extract structured data like JSON, Markdown, and HTML from documents using state-of-the-art AI models. It targets engineers and power users needing to automate document analysis, offering multilingual and multimodal capabilities with plug-and-play APIs for OCR, data extraction, classification, summarization, and translation. The SDK aims to simplify complex document workflows with easy-to-use interfaces and broad AI provider support.

How It Works

Docuglean provides a unified SDK with plug-and-play APIs for various document processing tasks. It leverages multiple AI providers, including OpenAI, Mistral, Google Gemini, and Hugging Face, supporting both multimodal (PDFs, images) inputs. A key advantage is its type-safe structured data extraction using Zod (TypeScript) or Pydantic (Python) schemas, ensuring data integrity. The system also includes built-in local parsers for common formats like DOCX, PPTX, XLSX, CSV, TSV, and PDF, reducing external dependencies for basic parsing.

Quick Start & Requirements

Primary Install:
- Node.js/TypeScript: npm install docuglean-ocr
- Python: pip install docuglean
Prerequisites: API keys are required for AI providers (OpenAI, Mistral, Google Gemini, Hugging Face). Local parsers for DOCX, PPTX, XLSX, CSV, TSV, and PDF do not require an API key.
Links: Code examples for Quick Start are provided within the README.

Highlighted Details

Easy-to-use API with detailed documentation and type hints.
OCR capabilities for extracting text from images and scanned documents.
Structured data extraction via Zod/Pydantic schemas for type-safe output.
Document classification for intelligently splitting multi-section documents.
Multimodal support for processing PDFs and images.
Support for multiple AI providers and models.
Batch processing for concurrent document handling with automatic error handling.
Built-in local parsers for DOCX, PPTX, XLSX, CSV, TSV, and PDF.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or a public roadmap were found in the provided text. The "Coming Soon" section indicates ongoing development.

Licensing & Compatibility

License Type: Apache 2.0.
Compatibility: Permissive licensing for commercial use, notably using pdftext (Apache/BSD) for PDF processing instead of AGPL-licensed alternatives like PyMuPDF.

Limitations & Caveats

Future enhancements are planned, including integration with more AI models and providers (e.g., Llama, Together AI, OpenRouter) and expanded multilingual support. The provided examples necessitate obtaining and configuring API keys for the chosen AI providers.

docuglean-ocr by cernis-intelligence

Explore Similar Projects

ezwork-ai-doc-translation by EHEWON

extract-dialogue by KMnO4-zx

Montscan by SystemVll

Versatile-OCR-Program by ses4255

anything-to-notebooklm by joeseesun

Feishu-MCP by cso1z

mineru-tianshu by magicyuan876

ade-python by landing-ai

thepipe by emcf

Burner-X by Feather-2

zerox by getomni-ai

docling by docling-project