PyTorch for portable, high-quality generative TTS
Top 82.4% on sourcepulse
PortaSpeech offers a PyTorch implementation for portable and high-quality generative text-to-speech (TTS). It targets researchers and developers seeking efficient, controllable, and high-fidelity speech synthesis, providing pre-trained models and clear instructions for inference and training.
How It Works
PortaSpeech utilizes a variational generator and a flow-based post-net for high-quality speech synthesis. It incorporates a linguistic encoder and offers controllability over speaking rate via duration ratios, drawing inspiration from FastSpeech2. The architecture is designed to avoid "mashed output" by omitting ReLU activation and LayerNorm in the VariationalGenerator.
Quick Start & Requirements
pip3 install -r requirements.txt
output/ckpt/DATASET/
.python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step RESTORE_STEP --mode single --dataset DATASET
python3 train.py --dataset DATASET
(supports single-node multi-GPU training and Automatic Mixed Precision).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
3 years ago
Inactive