comic-translate by ogkalu2

Desktop app for translating comics in multiple formats/languages

Created 1 year ago

2,323 stars

Top 19.4% on SourcePulse

Project Summary

This project provides a desktop application for automatically translating comic books across various formats (images, PDFs, CBR/CBZ) and multiple languages. It targets comic enthusiasts and creators looking to overcome language barriers in global comic content, leveraging state-of-the-art LLMs for high-quality translation.

How It Works

The application employs a multi-stage pipeline: speech bubble detection and text segmentation using YOLOv8 models, followed by Optical Character Recognition (OCR) using specialized libraries (doctr, manga-ocr, Pororo, PaddleOCR) or paid LLM/cloud services for enhanced accuracy. Text is then removed via inpainting with a fine-tuned LAMA model, and finally, translated using a selection of LLMs (GPT-4o, Claude, Gemini) or translation APIs (DeepL, Yandex, Google Translate), with the option to provide image context for improved translation quality.

Quick Start & Requirements

Install: Clone the repository, install Python 3.12, and use uv for dependency management (uv init --python 3.12, uv add -r requirements.txt --compile-bytecode).
Prerequisites: Python 3.12, Git, uv. For CBR files, WinRAR or 7-Zip added to PATH. NVIDIA GPU with CUDA 12.6+ recommended for PyTorch.
API Keys: Required for premium translation (GPT-4o) and OCR (GPT-4o, Google Cloud Vision, Azure Vision).
Docs: OpenAI Platform, Google Cloud Vision.

Highlighted Details

Supports a wide range of comic formats including images, PDF, EPUB, CBR, and CBZ.
Leverages SOTA LLMs like GPT-4o for translation, claiming superior performance over traditional services for distant language pairs.
Offers manual correction mode for addressing issues in automatic processing.
Integrates multiple OCR engines and translation services, allowing users to choose based on cost and quality.

Maintenance & Community

The project lists several GitHub repositories in its acknowledgments, indicating reliance on various open-source components. No specific community channels (Discord, Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project's dependencies include libraries with various licenses, which may impose restrictions on commercial use or redistribution.

Limitations & Caveats

The application requires API keys for its most advanced features, incurring costs. Font selection is critical for correct text rendering of target languages. The setup for CBR/CBZ files requires external archiving tools to be in the system's PATH.

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

51 stars in the last 30 days