PolyglotPDF  by CBIhalsen

PDF tool for layout-preserving translation

created 1 year ago
2,074 stars

Top 22.0% on sourcepulse

GitHubView on GitHub
Project Summary

PolyglotPDF is a multilingual eBook processing tool designed for efficient, layout-preserving translation of PDF documents. It targets users needing to translate scanned or digital PDFs, offering both online and offline translation capabilities with a focus on maintaining original formatting and achieving high performance.

How It Works

The tool leverages PyMuPDF for direct text block recognition and manipulation, enabling ultra-fast processing of text, tables, and formulas within PDFs, often under 1 second per page. This approach prioritizes performance and cost-effectiveness by avoiding computationally intensive AI-based formula recognition or page restructuring, focusing instead on accurate text extraction and translation while preserving layout. LLMs are now the primary translation API, with recommendations for models like Doubao, Qwen, Deepseek v3, and GPT-4-o-mini.

Quick Start & Requirements

  • Installation: Clone the repository, install dependencies via pip install -r requirements.txt, configure API keys in config.json, and run with python app.py.
  • Docker: Available via docker pull 2207397265/polyglotpdf:latest. Quick start without persistence or with persistent storage using volume mounts is detailed.
  • Requirements: Python 3.8+, specific versions of libraries like PyMuPDF, Flask, and requests. No GPU is required for text-based PDFs.
  • Access: Web interface available at http://127.0.0.1:8000 (or http://localhost:12226 for Docker).
  • Docs: PolyglotPDF Demo

Highlighted Details

  • Ultra-fast recognition of text, tables, and formulas (~1 second per page).
  • Layout-preserving translation, with full document translations typically under 10 seconds.
  • Supports scanned documents via OCR.
  • LLM integration for translation, with support for various Chinese LLM APIs (Doubao, Qwen, Deepseek).
  • Potential for future arXiv search and LaTeX translation rendering.

Maintenance & Community

  • The project is actively seeking feedback for improvements, particularly regarding complex color layouts and font handling.
  • A QQ group (1031477425) is available for discussions and related questions.
  • Contact information (QQ: 1421243966, email: 1421243966@qq.com) is provided.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The tool may encounter issues with unsupported color spaces during text re-editing, with a proposed workaround involving OCR for affected pages. Complex vector mathematical formulas in tables are not correctly handled. The project is described as a demonstration of layout-preserved translation and AI-assisted reading, not a comprehensive PDF editor.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
98 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.