alexandria-audiobook by Finrandojin

AI audiobook generator for voiceover production

Created 5 months ago

811 stars

Top 42.8% on SourcePulse

Project Summary

A multi-voice AI audiobook generator, Alexandria transforms books into audiobooks using LLM-driven script annotation and Qwen3-TTS. It targets users needing automated, high-fidelity audiobook production with advanced voice customization, offering unique character voices, cloning, and fine-tuning capabilities.

How It Works

Alexandria employs an AI pipeline that first uses an LLM to parse book text into a structured JSON format, identifying speakers, dialogue, and TTS instructions. An optional second LLM pass refines annotations. The core is the Qwen3-TTS engine, which can run locally or remotely, synthesizing speech with per-line style control. Novelty lies in its comprehensive voice generation suite: cloning from short audio samples, designing voices from text descriptions, and persistent voice identity training via LoRA fine-tuning.

Quick Start & Requirements

Installation is recommended via Pinokio, or a Google Colab notebook is available. A separate OpenAI-compatible LLM server (e.g., LM Studio, Ollama) must be running. A GPU with 8 GB VRAM minimum (16 GB+ recommended) is advised for optimal performance, supporting NVIDIA (CUDA 12.8+) and AMD (ROCm 6.3+ on Linux). CPU mode is available but significantly slower. Requires ~20 GB disk space and 16 GB RAM recommended. Initial TTS model downloads (~3.5 GB per variant) occur on first use.

Highlighted Details

Advanced Voice Synthesis: Features voice cloning from 5-15 second references, text-based voice design, and LoRA fine-tuning for custom, persistent voice identities.
LLM-Powered Scripting: Automates script annotation, including character dialogue, narration, and context-aware non-verbal sounds, with an optional review pass for error correction.
Flexible Export: Generates a single MP3 audiobook or prepares multi-track Audacity projects with per-speaker WAV files and labels for detailed editing.
Performance: Achieves 3-6x real-time throughput using batch rendering with codec compilation enabled.

Maintenance & Community

The project notes a recent surge in user attention, which may lead to slower issue response times. No specific community channels (Discord/Slack) or prominent contributors are detailed in the README.

Licensing & Compatibility

Licensed under MIT, Alexandria is generally compatible with commercial use and closed-source linking without significant restrictions.

Limitations & Caveats

A separate LLM server is a mandatory prerequisite. AMD GPUs on Windows and Apple Silicon Macs are limited to CPU processing, resulting in substantially slower performance. Initial TTS model downloads require stable internet connectivity.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

137 stars in the last 30 days