Desktop app for translating comics in multiple formats/languages
Top 23.3% on sourcepulse
This project provides a desktop application for automatically translating comic books across various formats (images, PDFs, CBR/CBZ) and multiple languages. It targets comic enthusiasts and creators looking to overcome language barriers in global comic content, leveraging state-of-the-art LLMs for high-quality translation.
How It Works
The application employs a multi-stage pipeline: speech bubble detection and text segmentation using YOLOv8 models, followed by Optical Character Recognition (OCR) using specialized libraries (doctr, manga-ocr, Pororo, PaddleOCR) or paid LLM/cloud services for enhanced accuracy. Text is then removed via inpainting with a fine-tuned LAMA model, and finally, translated using a selection of LLMs (GPT-4o, Claude, Gemini) or translation APIs (DeepL, Yandex, Google Translate), with the option to provide image context for improved translation quality.
Quick Start & Requirements
uv
for dependency management (uv init --python 3.12
, uv add -r requirements.txt --compile-bytecode
).uv
. For CBR files, WinRAR or 7-Zip added to PATH. NVIDIA GPU with CUDA 12.6+ recommended for PyTorch.Highlighted Details
Maintenance & Community
The project lists several GitHub repositories in its acknowledgments, indicating reliance on various open-source components. No specific community channels (Discord, Slack) or roadmap are mentioned in the README.
Licensing & Compatibility
The README does not explicitly state a license. The project's dependencies include libraries with various licenses, which may impose restrictions on commercial use or redistribution.
Limitations & Caveats
The application requires API keys for its most advanced features, incurring costs. Font selection is critical for correct text rendering of target languages. The setup for CBR/CBZ files requires external archiving tools to be in the system's PATH.
1 week ago
1 week