abogen  by denizsafak

Generate audiobooks from documents with synchronized captions

Created 4 months ago
3,565 stars

Top 13.6% on SourcePulse

GitHubView on GitHub
Project Summary

Abogen is a text-to-speech tool that converts EPUB, PDF, and text files into audiobooks with synchronized captions, targeting content creators and audiobook enthusiasts. It leverages the Kokoro-82M model for natural-sounding speech, offering a quick and efficient way to generate audio content with customizable subtitles.

How It Works

Abogen processes input files (EPUB, PDF, TXT) and utilizes the Kokoro-82M text-to-speech model to generate audio. It supports various subtitle synchronization options, from word-level to sentence-level highlighting. The tool also features a voice mixer for creating custom voices and can handle chapter splitting for e-books, saving each chapter as a separate audio file or merging them into a single audiobook.

Quick Start & Requirements

  • Installation: Varies by OS. Windows users can run WINDOWS_INSTALL.bat for an automated setup including CUDA. Alternatively, use pip install abogen (requires Python 3.10-3.12). Mac and Linux users install espeak-ng and then pip3 install abogen. Specific instructions for NVIDIA and AMD GPUs (Linux) are provided.
  • Prerequisites: espeak-ng is required on all platforms. CUDA is automatically handled for Windows installations via the script. For Mac (M1/M2), Kokoro's development version with MPS support needs to be installed (pip3 install git+https://github.com/hexgrad/kokoro.git).
  • Setup Time: Automated Windows installation is quick. Pip installations depend on package download and build times.
  • Links: Demo available at https://github.com/user-attachments/assets/094ba3df-7d66-494a-bc31-0e4b41d0b865.

Highlighted Details

  • Generates audiobooks with synchronized captions from various document formats.
  • Supports custom voice creation via a "Voice Mixer" by blending different voice models.
  • Offers batch processing through a queue mode for multiple files.
  • Handles chapter splitting and metadata tagging for e-books, enabling M4B output with chapters.

Maintenance & Community

The project welcomes contributions via pull requests. Specific contributors are credited for features like chapter support, voice mixing, and subtitle highlighting. Links to potential community channels are not explicitly provided in the README.

Licensing & Compatibility

Abogen is released under the MIT License. The underlying Kokoro model is licensed under Apache-2.0, which permits commercial use, modification, and distribution.

Limitations & Caveats

Subtitle generation is currently limited to English due to Kokoro's timestamp token support. Audio preview is not functional within the Docker container, and directory opening options in settings do not work in Dockerized environments.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
14
Star History
717 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.2%
9k
Toolkit for audio, music, and speech generation research
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.