translate-book  by deusyu

AI-powered book translation pipeline

Created 3 weeks ago

New!

604 stars

Top 54.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a Claude Code skill for translating entire books (PDF, DOCX, EPUB) into various languages using a distributed, parallel subagent architecture. It targets users needing efficient, robust, and accurate book translation, offering a significant improvement over single-session translation methods by preventing context truncation and enabling resumable workflows.

How It Works

The core innovation lies in restructuring book translation as a Claude Code Skill. Input documents are converted to Markdown, then split into manageable chunks. Each chunk is processed by an independent subagent with a fresh context window, enabling parallel translation (defaulting to 8 concurrent agents) and preventing context accumulation or output truncation. A manifest tracks chunk hashes, allowing for manifest-driven integrity checks and validation before merging translated chunks. The pipeline then unifies these chunks, converting them back into multiple output formats (HTML, DOCX, EPUB, PDF) via Pandoc and Calibre.

Quick Start & Requirements

  • Primary install / run command: npx skills add deusyu/translate-book -a claude-code -g (recommended). Alternatives include clawhub install translate-book or Git cloning. Translation is initiated within Claude Code via translate /path/to/book.pdf to [language].
  • Non-default prerequisites: Claude Code CLI (installed and authenticated), Calibre (with ebook-convert in PATH), Pandoc, Python 3 with pypandoc (pip install pypandoc), and optionally beautifulsoup4 (pip install beautifulsoup4).
  • Links: Calibre download, Pandoc download.

Highlighted Details

  • Parallel subagents (8 concurrent translators per batch) with isolated context windows.
  • Resumable translation at the chunk level; already-translated chunks are skipped on re-runs.
  • Manifest validation using SHA-256 hashes ensures integrity and prevents stale or corrupt outputs.
  • Multi-format output generation: HTML (with floating TOC), DOCX, EPUB, and PDF.
  • Supports multiple languages (zh, en, ja, ko, fr, de, es) and is extensible.

Maintenance & Community

No specific details regarding contributors, sponsorships, or community channels (e.g., Discord, Slack) are provided in the README.

Licensing & Compatibility

The project is released under the MIT License. This license generally permits commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

Successful operation requires the installation and correct configuration of external tools: Claude Code CLI, Calibre, and Pandoc. Re-running the translation for minor changes (e.g., metadata, templates) necessitates either a fresh run or manual deletion of existing output artifacts. Manifest validation failures indicate that source chunks have been altered since the initial conversion. PDF generation is dependent on Calibre's PDF output capabilities.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
1
Star History
610 stars in the last 27 days

Explore Similar Projects

Starred by Travis Fischer Travis Fischer(Founder of Agentic), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

MinerU by opendatalab

2.0%
59k
PDF extraction tool for converting PDFs to Markdown and JSON
Created 2 years ago
Updated 2 days ago
Feedback? Help us improve.