Audiobook creator using open-source TTS/STS models
Top 65.1% on sourcepulse
This project provides a Windows GUI application for creating audiobooks using deep learning text-to-speech (TTS) and speech-to-speech (S2S) models. It targets users who want to generate high-quality audiobooks with features like multi-speaker support, in-progress saving, and bulk editing, leveraging advancements in AI for a more seamless workflow.
How It Works
The application employs a modular architecture, moving from a monolithic design to a closer approximation of Model-View-Controller (MVC). This separation of concerns into view.py
(GUI), controller.py
(logic), and model.py
(functional code) enhances maintainability and facilitates the integration of new TTS and S2S engines. Each engine is configured dynamically, requiring only a defined loading and generation procedure that returns an audio path to model.py
.
Quick Start & Requirements
pip install -r requirements.txt
), initializing and updating submodules, and launching the controller (python src/controller.py
).Highlighted Details
Maintenance & Community
The project is actively maintained by JarodMica. Updates are managed via git pull
and git submodule update
. The README provides instructions for handling potential conflicts if local modifications have been made.
Licensing & Compatibility
The core application's engines are MIT or Apache-2.0 licensed. However, pre-trained models have specific usage limitations: StyleTTS 2 requires attribution or explicit permission for synthesized voices, and F5-TTS uses a CC-By-NC-4.0 licensed base model, restricting commercial use.
Limitations & Caveats
The application is primarily designed for Windows. The use of a GUI framework other than Gradio means it cannot be run on cloud computers and requires local hardware. The F5-TTS base model's non-commercial license restricts its use in commercial audiobook production. Torch version management can be complex, potentially requiring reinstallation after adding different engines.
1 month ago
Inactive