abogen by denizsafak

Generate audiobooks from documents with synchronized captions

Created 10 months ago

4,142 stars

Top 11.7% on SourcePulse

Project Summary

Abogen is a text-to-speech tool that converts EPUB, PDF, and text files into audiobooks with synchronized captions, targeting content creators and audiobook enthusiasts. It leverages the Kokoro-82M model for natural-sounding speech, offering a quick and efficient way to generate audio content with customizable subtitles.

How It Works

Abogen processes input files (EPUB, PDF, TXT) and utilizes the Kokoro-82M text-to-speech model to generate audio. It supports various subtitle synchronization options, from word-level to sentence-level highlighting. The tool also features a voice mixer for creating custom voices and can handle chapter splitting for e-books, saving each chapter as a separate audio file or merging them into a single audiobook.

Quick Start & Requirements

Installation: Varies by OS. Windows users can run WINDOWS_INSTALL.bat for an automated setup including CUDA. Alternatively, use pip install abogen (requires Python 3.10-3.12). Mac and Linux users install espeak-ng and then pip3 install abogen. Specific instructions for NVIDIA and AMD GPUs (Linux) are provided.
Prerequisites: espeak-ng is required on all platforms. CUDA is automatically handled for Windows installations via the script. For Mac (M1/M2), Kokoro's development version with MPS support needs to be installed (pip3 install git+https://github.com/hexgrad/kokoro.git).
Setup Time: Automated Windows installation is quick. Pip installations depend on package download and build times.
Links: Demo available at https://github.com/user-attachments/assets/094ba3df-7d66-494a-bc31-0e4b41d0b865.

Highlighted Details

Generates audiobooks with synchronized captions from various document formats.
Supports custom voice creation via a "Voice Mixer" by blending different voice models.
Offers batch processing through a queue mode for multiple files.
Handles chapter splitting and metadata tagging for e-books, enabling M4B output with chapters.

Maintenance & Community

The project welcomes contributions via pull requests. Specific contributors are credited for features like chapter support, voice mixing, and subtitle highlighting. Links to potential community channels are not explicitly provided in the README.

Licensing & Compatibility

Abogen is released under the MIT License. The underlying Kokoro model is licensed under Apache-2.0, which permits commercial use, modification, and distribution.

Limitations & Caveats

Subtitle generation is currently limited to English due to Kokoro's timestamp token support. Audio preview is not functional within the Docker container, and directory opening options in settings do not work in Dockerized environments.

Health Check

Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

78 stars in the last 30 days