audiobook-creator  by prakharsr

Audiobook creator for converting books to speech with voice attribution

created 5 months ago
309 stars

Top 88.0% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an open-source tool for converting books (EPUB, PDF, TXT) into audiobooks with intelligent character voice attribution. It targets users who want to create personalized, multi-voice audiobooks using NLP and LLM technologies, offering an engaging listening experience.

How It Works

The tool processes books in three stages: text cleaning and formatting, character identification using GLiNER and LLMs for gender/age scoring, and audiobook generation with Kokoro TTS. It supports both single-voice and multi-voice narration, assigning distinct voices to identified characters based on inferred attributes. This approach enables dynamic, character-aware audiobooks.

Quick Start & Requirements

  • Installation: Docker (recommended) or direct run via uv and Python 3.12.
  • Prerequisites:
    • Docker with host networking enabled.
    • An OpenAI-compatible LLM endpoint (e.g., LM Studio).
    • Kokoro TTS model setup (via Kokoro-FastAPI).
    • For direct run: calibre (optional, for M4B) and ffmpeg.
  • Setup: Detailed instructions for Docker and direct installation are provided.
  • Links: Demo Video

Highlighted Details

  • Gradio UI for user-friendly audiobook creation.
  • M4B audiobook creation with metadata and chapter timestamps.
  • Multi-format input (EPUB, PDF, TXT) and output (AAC, M4A, MP3, WAV, OPUS, FLAC, PCM, M4B).
  • Supports parallel batch inferencing for faster audio generation via Kokoro TTS.

Maintenance & Community

  • Contributions are welcome via GitHub issues and pull requests.
  • Donations are accepted via PayPal.
  • GitHub Repository

Licensing & Compatibility

  • Licensed under GNU General Public License v3.0 (GPL-3.0).
  • GPL-3.0 is a strong copyleft license, requiring derivative works to also be open-sourced under the same license. Commercial use or linking with closed-source software may require careful consideration of license obligations.

Limitations & Caveats

  • Apple Silicon GPUs are not currently supported for CUDA-based inference; CPU inference is the alternative.
  • The effectiveness of character voice attribution relies on the accuracy of the underlying NLP and LLM models.
Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
1
Star History
64 stars in the last 90 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ultravox by fixie-ai

0.4%
4k
Multimodal LLM for real-time voice interactions
created 1 year ago
updated 4 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

AudioGPT by AIGC-Audio

0.1%
10k
Audio processing and generation research project
created 2 years ago
updated 1 year ago
Feedback? Help us improve.