audiobook-creator by prakharsr

Audiobook creator for converting books to speech with voice attribution

Created 11 months ago

436 stars

Top 68.3% on SourcePulse

Project Summary

This project provides an open-source tool for converting books (EPUB, PDF, TXT) into audiobooks with intelligent character voice attribution. It targets users who want to create personalized, multi-voice audiobooks using NLP and LLM technologies, offering an engaging listening experience.

How It Works

The tool processes books in three stages: text cleaning and formatting, character identification using GLiNER and LLMs for gender/age scoring, and audiobook generation with Kokoro TTS. It supports both single-voice and multi-voice narration, assigning distinct voices to identified characters based on inferred attributes. This approach enables dynamic, character-aware audiobooks.

Quick Start & Requirements

Installation: Docker (recommended) or direct run via uv and Python 3.12.
Prerequisites:
- Docker with host networking enabled.
- An OpenAI-compatible LLM endpoint (e.g., LM Studio).
- Kokoro TTS model setup (via Kokoro-FastAPI).
- For direct run: calibre (optional, for M4B) and ffmpeg.
Setup: Detailed instructions for Docker and direct installation are provided.
Links: Demo Video

Highlighted Details

Gradio UI for user-friendly audiobook creation.
M4B audiobook creation with metadata and chapter timestamps.
Multi-format input (EPUB, PDF, TXT) and output (AAC, M4A, MP3, WAV, OPUS, FLAC, PCM, M4B).
Supports parallel batch inferencing for faster audio generation via Kokoro TTS.

Maintenance & Community

Contributions are welcome via GitHub issues and pull requests.
Donations are accepted via PayPal.
GitHub Repository

Licensing & Compatibility

Licensed under GNU General Public License v3.0 (GPL-3.0).
GPL-3.0 is a strong copyleft license, requiring derivative works to also be open-sourced under the same license. Commercial use or linking with closed-source software may require careful consideration of license obligations.

Limitations & Caveats

Apple Silicon GPUs are not currently supported for CUDA-based inference; CPU inference is the alternative.
The effectiveness of character voice attribution relies on the accuracy of the underlying NLP and LLM models.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

1

Star History

19 stars in the last 30 days

Explore Similar Projects

Pandrator by lukaszliniewicz

GUI framework for audiobook, subtitle, and dubbing generation

Created 1 year ago

Updated 8 months ago

WavJourney by Audio-AGI

Audio creation pipeline using LLMs for compositional generation

Created 2 years ago

Updated 2 years ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

ollama-voice-mac by apeatling

Offline voice assistant for macOS

Created 2 years ago

Updated 4 months ago

chatgpt-conversation by platelminto

Voice interface for ChatGPT

Created 3 years ago

Updated 3 years ago

Starred by

Sasha Rush

Sasha Rush(Research Scientist at Cursor; Professor at Cornell Tech).

epub2tts by aedocw

CLI tool to create audiobooks from epub/text files using TTS engines

Created 2 years ago

Updated 3 weeks ago

sesame_csm_openai by phildougherty

OpenAI-compatible TTS API for voice cloning

Created 10 months ago

Updated 3 months ago

Easy-Voice-Toolkit by Spr-Aachen

Local AI voice toolkit for audio processing, recognition, transcription, and conversion

Created 2 years ago

Updated 3 weeks ago

PDF2Audio by lamm-mit

PDF-to-audio conversion tool

Created 1 year ago

Updated 8 months ago

epub_to_audiobook by p0n1

CLI tool to convert EPUBs to audiobooks

Created 2 years ago

Updated 1 week ago

abogen by denizsafak

Generate audiobooks from documents with synchronized captions

Created 8 months ago

Updated 2 days ago

easyVoice by cosin2077

Text-to-speech tool for long texts and multi-character dubbing

Created 10 months ago

Updated 8 months ago

Starred by

Shawn Wang

Shawn Wang(Editor of Latent Space) and

Magnus Müller

Magnus Müller(Cofounder of Browser Use).

noScribe by kaixxx

GUI tool for local AI-powered audio transcription

Created 2 years ago

Updated 1 week ago

Feedback? Help us improve.