ebook2audiobook by DrewThomasson

Convert ebooks to audiobooks with advanced TTS and voice cloning

Created 2 years ago

19,508 stars

Top 2.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Jason Miller

Author of Preact

Dan Guido

Cofounder of Trail of Bits

Abubakar Abid

Cofounder of Gradio

Project Summary

Summary

ebook2audiobook is a versatile tool for converting e-books into audiobooks, supporting over 1158 languages and offering advanced text-to-speech (TTS) capabilities including voice cloning. It targets users seeking personalized audiobook creation from various digital text formats, providing a flexible and feature-rich solution.

How It Works

The project leverages multiple TTS engines (XTTSv2, Bark, VITS, etc.) to generate speech from diverse e-book formats (.epub, .pdf, .mobi, .txt, etc.). It supports OCR for image-based documents and allows for fine-grained control over audio output using SML tags for pauses, breaks, and voice switching. Voice cloning enables users to use their own voice for narration, enhancing personalization.

Quick Start & Requirements

Installation involves cloning the repository and running provided scripts (.command for Linux/macOS, .cmd for Windows) or using Docker. Minimum hardware requirements are 2GB RAM and 1GB VRAM (8GB RAM, 4GB VRAM recommended), with support for CPU, CUDA, ROCm, MPS, XPU, and JETSON devices. Modern TTS engines perform significantly better on GPUs. Links to demos are available via GitHub assets.

Highlighted Details

Extensive language support: 1158+ languages and dialects.
Broad input format compatibility: .epub, .pdf, .mobi, .txt, .html, .rtf, .doc, .docx, and image-based formats via OCR.
Advanced TTS features: Voice cloning, SML tags for detailed speech control (pauses, breaks, voice changes).
Multiple output formats: .m4b, .m4a, .mp3, .wav, .ogg, .webm, etc.
Low-resource friendly: Operates with as little as 2GB RAM and 1GB VRAM.

Maintenance & Community

The project exhibits active development with a detailed roadmap including features like parallel conversion, sentence editing, and integration of more TTS engines. Contributions for language support and model improvements are encouraged. The GitHub repository serves as the primary hub for community interaction and development.

Licensing & Compatibility

The repository's README does not explicitly state a software license. Users should exercise caution regarding usage rights and potential compatibility issues with closed-source projects until a license is clarified.

Limitations & Caveats

The tool is intended strictly for non-DRM, legally acquired eBooks. EPUB files may require manual text cleanup due to inconsistent chapter structuring. Performance on CPU is significantly slower for advanced TTS models. Apple Silicon's MPS is not exposed in Docker containers, necessitating CPU usage within Docker. The enable_text_splitting option is noted as inefficient. Custom model uploads are currently limited to XTTSv2.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

296 stars in the last 30 days