Discover and explore top open-source AI tools and projects—updated daily.
DrewThomassonConvert ebooks to audiobooks with advanced TTS and voice cloning
Top 2.7% on SourcePulse
Summary
ebook2audiobook is a versatile tool for converting e-books into audiobooks, supporting over 1158 languages and offering advanced text-to-speech (TTS) capabilities including voice cloning. It targets users seeking personalized audiobook creation from various digital text formats, providing a flexible and feature-rich solution.
How It Works
The project leverages multiple TTS engines (XTTSv2, Bark, VITS, etc.) to generate speech from diverse e-book formats (.epub, .pdf, .mobi, .txt, etc.). It supports OCR for image-based documents and allows for fine-grained control over audio output using SML tags for pauses, breaks, and voice switching. Voice cloning enables users to use their own voice for narration, enhancing personalization.
Quick Start & Requirements
Installation involves cloning the repository and running provided scripts (.command for Linux/macOS, .cmd for Windows) or using Docker. Minimum hardware requirements are 2GB RAM and 1GB VRAM (8GB RAM, 4GB VRAM recommended), with support for CPU, CUDA, ROCm, MPS, XPU, and JETSON devices. Modern TTS engines perform significantly better on GPUs. Links to demos are available via GitHub assets.
Highlighted Details
Maintenance & Community
The project exhibits active development with a detailed roadmap including features like parallel conversion, sentence editing, and integration of more TTS engines. Contributions for language support and model improvements are encouraged. The GitHub repository serves as the primary hub for community interaction and development.
Licensing & Compatibility
The repository's README does not explicitly state a software license. Users should exercise caution regarding usage rights and potential compatibility issues with closed-source projects until a license is clarified.
Limitations & Caveats
The tool is intended strictly for non-DRM, legally acquired eBooks. EPUB files may require manual text cleanup due to inconsistent chapter structuring. Performance on CPU is significantly slower for advanced TTS models. Apple Silicon's MPS is not exposed in Docker containers, necessitating CPU usage within Docker. The enable_text_splitting option is noted as inefficient. Custom model uploads are currently limited to XTTSv2.
19 hours ago
Inactive