Discover and explore top open-source AI tools and projects—updated daily.
yfedoseevHigh-performance PDF toolkit for diverse applications
Top 70.3% on SourcePulse
PDF Oxide is a high-performance PDF processing toolkit built with a Rust core and available for Python, Rust, WASM, and CLI. It addresses the need for fast and reliable text and image extraction, markdown conversion, and PDF manipulation, targeting developers and researchers who require efficient document processing for applications like RAG/LLM pipelines, AI assistants, and large-scale data extraction. Its primary benefit is significantly faster processing speeds and higher reliability compared to existing libraries, coupled with a permissive license.
How It Works
The project leverages a Rust backend for core PDF parsing and manipulation, providing exceptional speed and memory efficiency. This is exposed through native Rust APIs, Python bindings (using maturin), and WebAssembly for browser/Node.js environments. A dedicated CLI tool and an MCP server for AI assistants further broaden its applicability. This multi-faceted approach ensures high performance across various platforms and use cases, with the Rust core being the key differentiator for its speed and reliability.
Quick Start & Requirements
pip install pdf_oxide. Supports Python 3.8–3.14. Wheels are available for Linux, macOS, and Windows.pdf_oxide = "0.3" to Cargo.toml.brew install yfedoseev/tap/pdf-oxide) or Cargo (cargo install pdf_oxide_cli).npm install pdf-oxide-wasm.cargo install pdf_oxide_mcp). Configuration details for AI assistants like Claude and Cursor are provided.cargo build --release, and maturin develop for Python bindings.Highlighted Details
Maintenance & Community
The project is maintained by Yury Fedoseev, with the source code available on GitHub. No specific community channels (e.g., Discord, Slack) or sponsorship details are mentioned in the README.
Licensing & Compatibility
PDF Oxide is dual-licensed under MIT or Apache-2.0, allowing for free use in both commercial and open-source projects without the copyleft restrictions found in AGPL-licensed alternatives.
Limitations & Caveats
The library is at version 0.3.14, indicating active development. While it claims a 100% pass rate on valid PDFs, the README notes specific intentionally broken test fixtures that do not pass. No other explicit limitations or unsupported platforms are detailed.
3 days ago
Inactive
allenai