alexandria-audiobook  by Finrandojin

AI audiobook generator for voiceover production

Created 3 weeks ago

New!

325 stars

Top 84.1% on SourcePulse

GitHubView on GitHub
Project Summary

A multi-voice AI audiobook generator, Alexandria transforms books into audiobooks using LLM-driven script annotation and Qwen3-TTS. It targets users needing automated, high-fidelity audiobook production with advanced voice customization, offering unique character voices, cloning, and fine-tuning capabilities.

How It Works

Alexandria employs an AI pipeline that first uses an LLM to parse book text into a structured JSON format, identifying speakers, dialogue, and TTS instructions. An optional second LLM pass refines annotations. The core is the Qwen3-TTS engine, which can run locally or remotely, synthesizing speech with per-line style control. Novelty lies in its comprehensive voice generation suite: cloning from short audio samples, designing voices from text descriptions, and persistent voice identity training via LoRA fine-tuning.

Quick Start & Requirements

Installation is recommended via Pinokio, or a Google Colab notebook is available. A separate OpenAI-compatible LLM server (e.g., LM Studio, Ollama) must be running. A GPU with 8 GB VRAM minimum (16 GB+ recommended) is advised for optimal performance, supporting NVIDIA (CUDA 12.8+) and AMD (ROCm 6.3+ on Linux). CPU mode is available but significantly slower. Requires ~20 GB disk space and 16 GB RAM recommended. Initial TTS model downloads (~3.5 GB per variant) occur on first use.

Highlighted Details

  • Advanced Voice Synthesis: Features voice cloning from 5-15 second references, text-based voice design, and LoRA fine-tuning for custom, persistent voice identities.
  • LLM-Powered Scripting: Automates script annotation, including character dialogue, narration, and context-aware non-verbal sounds, with an optional review pass for error correction.
  • Flexible Export: Generates a single MP3 audiobook or prepares multi-track Audacity projects with per-speaker WAV files and labels for detailed editing.
  • Performance: Achieves 3-6x real-time throughput using batch rendering with codec compilation enabled.

Maintenance & Community

The project notes a recent surge in user attention, which may lead to slower issue response times. No specific community channels (Discord/Slack) or prominent contributors are detailed in the README.

Licensing & Compatibility

Licensed under MIT, Alexandria is generally compatible with commercial use and closed-source linking without significant restrictions.

Limitations & Caveats

A separate LLM server is a mandatory prerequisite. AMD GPUs on Windows and Apple Silicon Macs are limited to CPU processing, resulting in substantially slower performance. Initial TTS model downloads require stable internet connectivity.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
12
Star History
328 stars in the last 22 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.4%
55k
Few-shot voice cloning and TTS web UI
Created 2 years ago
Updated 2 weeks ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Didier Lopes Didier Lopes(Founder of OpenBB), and
14 more.

Real-Time-Voice-Cloning by CorentinJ

0.1%
59k
Voice cloning for real-time speech generation
Created 6 years ago
Updated 2 months ago
Feedback? Help us improve.