comic-translate  by ogkalu2

Desktop app for translating comics in multiple formats/languages

created 1 year ago
1,908 stars

Top 23.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a desktop application for automatically translating comic books across various formats (images, PDFs, CBR/CBZ) and multiple languages. It targets comic enthusiasts and creators looking to overcome language barriers in global comic content, leveraging state-of-the-art LLMs for high-quality translation.

How It Works

The application employs a multi-stage pipeline: speech bubble detection and text segmentation using YOLOv8 models, followed by Optical Character Recognition (OCR) using specialized libraries (doctr, manga-ocr, Pororo, PaddleOCR) or paid LLM/cloud services for enhanced accuracy. Text is then removed via inpainting with a fine-tuned LAMA model, and finally, translated using a selection of LLMs (GPT-4o, Claude, Gemini) or translation APIs (DeepL, Yandex, Google Translate), with the option to provide image context for improved translation quality.

Quick Start & Requirements

  • Install: Clone the repository, install Python 3.12, and use uv for dependency management (uv init --python 3.12, uv add -r requirements.txt --compile-bytecode).
  • Prerequisites: Python 3.12, Git, uv. For CBR files, WinRAR or 7-Zip added to PATH. NVIDIA GPU with CUDA 12.6+ recommended for PyTorch.
  • API Keys: Required for premium translation (GPT-4o) and OCR (GPT-4o, Google Cloud Vision, Azure Vision).
  • Docs: OpenAI Platform, Google Cloud Vision.

Highlighted Details

  • Supports a wide range of comic formats including images, PDF, EPUB, CBR, and CBZ.
  • Leverages SOTA LLMs like GPT-4o for translation, claiming superior performance over traditional services for distant language pairs.
  • Offers manual correction mode for addressing issues in automatic processing.
  • Integrates multiple OCR engines and translation services, allowing users to choose based on cost and quality.

Maintenance & Community

The project lists several GitHub repositories in its acknowledgments, indicating reliance on various open-source components. No specific community channels (Discord, Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project's dependencies include libraries with various licenses, which may impose restrictions on commercial use or redistribution.

Limitations & Caveats

The application requires API keys for its most advanced features, incurring costs. Font selection is critical for correct text rendering of target languages. The setup for CBR/CBZ files requires external archiving tools to be in the system's PATH.

Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
5
Issues (30d)
6
Star History
268 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.