audiobook-creator  by prakharsr

Audiobook creator for converting books to speech with voice attribution

Created 7 months ago
334 stars

Top 82.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an open-source tool for converting books (EPUB, PDF, TXT) into audiobooks with intelligent character voice attribution. It targets users who want to create personalized, multi-voice audiobooks using NLP and LLM technologies, offering an engaging listening experience.

How It Works

The tool processes books in three stages: text cleaning and formatting, character identification using GLiNER and LLMs for gender/age scoring, and audiobook generation with Kokoro TTS. It supports both single-voice and multi-voice narration, assigning distinct voices to identified characters based on inferred attributes. This approach enables dynamic, character-aware audiobooks.

Quick Start & Requirements

  • Installation: Docker (recommended) or direct run via uv and Python 3.12.
  • Prerequisites:
    • Docker with host networking enabled.
    • An OpenAI-compatible LLM endpoint (e.g., LM Studio).
    • Kokoro TTS model setup (via Kokoro-FastAPI).
    • For direct run: calibre (optional, for M4B) and ffmpeg.
  • Setup: Detailed instructions for Docker and direct installation are provided.
  • Links: Demo Video

Highlighted Details

  • Gradio UI for user-friendly audiobook creation.
  • M4B audiobook creation with metadata and chapter timestamps.
  • Multi-format input (EPUB, PDF, TXT) and output (AAC, M4A, MP3, WAV, OPUS, FLAC, PCM, M4B).
  • Supports parallel batch inferencing for faster audio generation via Kokoro TTS.

Maintenance & Community

  • Contributions are welcome via GitHub issues and pull requests.
  • Donations are accepted via PayPal.
  • GitHub Repository

Licensing & Compatibility

  • Licensed under GNU General Public License v3.0 (GPL-3.0).
  • GPL-3.0 is a strong copyleft license, requiring derivative works to also be open-sourced under the same license. Commercial use or linking with closed-source software may require careful consideration of license obligations.

Limitations & Caveats

  • Apple Silicon GPUs are not currently supported for CUDA-based inference; CPU inference is the alternative.
  • The effectiveness of character voice attribution relies on the accuracy of the underlying NLP and LLM models.
Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
20 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.