Rust-based CLI tool for high-performance document parsing
Top 61.2% on SourcePulse
Ferrules is a high-performance document parsing library written in Rust, designed to efficiently generate LLM-ready documents. It targets developers and researchers needing fast, robust document processing, offering advantages over slower Python-based alternatives.
How It Works
Ferrules leverages pdfium2
for PDF parsing and integrates Apple's Vision framework for OCR on macOS, utilizing objc2
Rust bindings. It extracts and analyzes document layouts with advanced preprocessing and postprocessing, merging text lines with layout information for comprehensive understanding. For accelerated inference, it utilizes the ort
library, supporting Apple Neural Engine (ANE) and GPU acceleration. The library intelligently groups elements like captions and footers, detects headings via machine learning, and offers HTML, Markdown, and JSON rendering.
Quick Start & Requirements
ferrules path/to/your.pdf
. API Server: ferrules-api
.Highlighted Details
Maintenance & Community
The project is marked as "Work in Progress" with a roadmap available for upcoming features.
Licensing & Compatibility
The license is not explicitly stated in the README.
Limitations & Caveats
Linux support, particularly with NVIDIA GPU acceleration, is still under development. Configurable inference parameters are listed as "COMING SOON."
1 week ago
1 week