Document intelligence framework for Python
Top 20.0% on SourcePulse
Kreuzberg is a Python framework for document intelligence, designed to extract text, metadata, and structured data from a wide array of document formats. It targets developers and researchers needing a unified, high-performance solution for document processing, offering robust capabilities through an extensible API.
How It Works
Kreuzberg unifies document processing by leveraging established open-source libraries like Pandoc for format conversion, PDFium for PDF rendering, and Tesseract for OCR. This approach ensures broad format support and accurate extraction. It features a plugin architecture for custom extractors and provides both synchronous and asynchronous APIs for flexibility in different application contexts.
Quick Start & Requirements
pip install kreuzberg
or pip install kreuzberg[all]
for full features.Highlighted Details
Maintenance & Community
The project is maintained by Goldziher. No specific community channels (Discord/Slack) or roadmap links are provided in the README.
Licensing & Compatibility
The project is released under the MIT License, permitting commercial use and integration with closed-source applications.
Limitations & Caveats
The README does not detail specific limitations, unsupported features, or known issues. The performance benchmarks are presented without explicit methodology details, though a link to "detailed analysis" is provided.
1 day ago
Inactive