ferrules  by AmineDiro

Rust-based CLI tool for high-performance document parsing

created 7 months ago
509 stars

Top 61.2% on SourcePulse

GitHubView on GitHub
Project Summary

Ferrules is a high-performance document parsing library written in Rust, designed to efficiently generate LLM-ready documents. It targets developers and researchers needing fast, robust document processing, offering advantages over slower Python-based alternatives.

How It Works

Ferrules leverages pdfium2 for PDF parsing and integrates Apple's Vision framework for OCR on macOS, utilizing objc2 Rust bindings. It extracts and analyzes document layouts with advanced preprocessing and postprocessing, merging text lines with layout information for comprehensive understanding. For accelerated inference, it utilizes the ort library, supporting Apple Neural Engine (ANE) and GPU acceleration. The library intelligently groups elements like captions and footers, detects headings via machine learning, and offers HTML, Markdown, and JSON rendering.

Quick Start & Requirements

  • Installation: Precompiled binaries are available for macOS via GitHub Releases.
  • Prerequisites: macOS (Linux with NVIDIA GPU support coming soon).
  • Usage: CLI: ferrules path/to/your.pdf. API Server: ferrules-api.
  • Resources: GitHub Releases

Highlighted Details

  • Zero-dependency deployment, eliminating the need for a Python runtime.
  • Hardware-accelerated ML inference on Apple Neural Engine (ANE) and GPU.
  • Intelligent document structuring, including heading detection and element grouping.
  • Offers both CLI and HTTP API interfaces for flexible integration.

Maintenance & Community

The project is marked as "Work in Progress" with a roadmap available for upcoming features.

Licensing & Compatibility

The license is not explicitly stated in the README.

Limitations & Caveats

Linux support, particularly with NVIDIA GPU acceleration, is still under development. Configurable inference parameters are listed as "COMING SOON."

Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
14 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Jerry Liu Jerry Liu(Cofounder of LlamaIndex).

sparrow by katanaml

0.2%
5k
Data processing & instruction calling tool using ML, LLM, and Vision LLM
created 3 years ago
updated 1 month ago
Feedback? Help us improve.