PyTorch framework for non-autoregressive text-to-speech (NAR-TTS)
Top 38.0% on sourcepulse
This repository provides a PyTorch framework for Non-Autoregressive Text-to-Speech (NAR-TTS), featuring official implementations of PortaSpeech and DiffSpeech. It targets researchers and developers in speech synthesis, offering a scalable platform for training and inference with a focus on high-quality, efficient speech generation.
How It Works
The framework leverages non-autoregressive models, which generate speech features in parallel rather than sequentially. This approach significantly speeds up inference compared to autoregressive models. It utilizes Montreal Forced Aligner (MFA) for data processing and includes a custom random-access dataset implementation for efficient handling of large speech datasets.
Quick Start & Requirements
PYTHONPATH
, create and activate a virtual environment, install dependencies via pip install -r requirements.txt
, and install MFA using bash mfa_usr/install_mfa.sh
.sox
, and libsox-fmt-mp3
. Tested on Linux/Ubuntu 18.04.Highlighted Details
Maintenance & Community
The project is associated with the NATSpeech research group. Links to Hugging Face demos and Chinese documentation are provided.
Licensing & Compatibility
The project is released under a custom license. It explicitly prohibits using the technology to generate speech of individuals without their consent, citing potential copyright violations.
Limitations & Caveats
The license contains significant restrictions on usage, particularly concerning the generation of speech for specific individuals, which may impact commercial or broader research applications. The installation requires specific versions of PyTorch and NumPy, and includes system-level dependencies like sox
.
2 years ago
Inactive