PolyglotPDF by CBIhalsen

PDF tool for layout-preserving translation

Created 1 year ago

2,141 stars

Top 20.7% on SourcePulse

Project Summary

PolyglotPDF is a multilingual eBook processing tool designed for efficient, layout-preserving translation of PDF documents. It targets users needing to translate scanned or digital PDFs, offering both online and offline translation capabilities with a focus on maintaining original formatting and achieving high performance.

How It Works

The tool leverages PyMuPDF for direct text block recognition and manipulation, enabling ultra-fast processing of text, tables, and formulas within PDFs, often under 1 second per page. This approach prioritizes performance and cost-effectiveness by avoiding computationally intensive AI-based formula recognition or page restructuring, focusing instead on accurate text extraction and translation while preserving layout. LLMs are now the primary translation API, with recommendations for models like Doubao, Qwen, Deepseek v3, and GPT-4-o-mini.

Quick Start & Requirements

Installation: Clone the repository, install dependencies via pip install -r requirements.txt, configure API keys in config.json, and run with python app.py.
Docker: Available via docker pull 2207397265/polyglotpdf:latest. Quick start without persistence or with persistent storage using volume mounts is detailed.
Requirements: Python 3.8+, specific versions of libraries like PyMuPDF, Flask, and requests. No GPU is required for text-based PDFs.
Access: Web interface available at http://127.0.0.1:8000 (or http://localhost:12226 for Docker).
Docs: PolyglotPDF Demo

Highlighted Details

Ultra-fast recognition of text, tables, and formulas (~1 second per page).
Layout-preserving translation, with full document translations typically under 10 seconds.
Supports scanned documents via OCR.
LLM integration for translation, with support for various Chinese LLM APIs (Doubao, Qwen, Deepseek).
Potential for future arXiv search and LaTeX translation rendering.

Maintenance & Community

The project is actively seeking feedback for improvements, particularly regarding complex color layouts and font handling.
A QQ group (1031477425) is available for discussions and related questions.
Contact information (QQ: 1421243966, email: 1421243966@qq.com) is provided.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The tool may encounter issues with unsupported color spaces during text re-editing, with a proposed workaround involving OCR for affected pages. Complex vector mathematical formulas in tables are not correctly handled. The project is described as a demonstration of layout-preserved translation and AI-assisted reading, not a comprehensive PDF editor.

PolyglotPDF by CBIhalsen

Explore Similar Projects

FreePDF by zstar1003

Index_PDF_Translation by Mega-Gorilla

docutranslate by xunbu

llm-based-ocr by yigitkonur

HunyuanOCR by Tencent-Hunyuan

zotero-pdf2zh by guaguastandup

deepdoctection by deepdoctection

pdf-craft by oomol-lab

STranslate by STranslate

dots.ocr by rednote-hilab

manga-image-translator by zyddnys

LunaTranslator by HIllya51