PortaSpeech  by keonlee9420

PyTorch for portable, high-quality generative TTS

created 3 years ago
339 stars

Top 82.4% on sourcepulse

GitHubView on GitHub
Project Summary

PortaSpeech offers a PyTorch implementation for portable and high-quality generative text-to-speech (TTS). It targets researchers and developers seeking efficient, controllable, and high-fidelity speech synthesis, providing pre-trained models and clear instructions for inference and training.

How It Works

PortaSpeech utilizes a variational generator and a flow-based post-net for high-quality speech synthesis. It incorporates a linguistic encoder and offers controllability over speaking rate via duration ratios, drawing inspiration from FastSpeech2. The architecture is designed to avoid "mashed output" by omitting ReLU activation and LayerNorm in the VariationalGenerator.

Quick Start & Requirements

  • Install dependencies: pip3 install -r requirements.txt
  • Dockerfile is provided.
  • Download pre-trained models and place them in output/ckpt/DATASET/.
  • Inference: python3 synthesize.py --text "YOUR_DESIRED_TEXT" --restore_step RESTORE_STEP --mode single --dataset DATASET
  • Preprocessing requires Montreal Forced Aligner (MFA) for alignment. Pre-extracted alignments are available.
  • Training: python3 train.py --dataset DATASET (supports single-node multi-GPU training and Automatic Mixed Precision).
  • Documentation: demo, prepare_align.py, preprocess.py, train.py

Highlighted Details

  • Offers "Normal" (24M parameters) and "Small" (7.6M parameters) model variants.
  • Supports controllable speaking rate and two helper losses (CTC, DGA) for improved word-to-phoneme alignment.
  • Compatible with HiFi-GAN and MelGAN vocoders.
  • TensorBoard integration for monitoring training progress.

Maintenance & Community

  • The project is maintained by keonlee9420.
  • References include VITS, Glow-TTS, and other TTS projects by the same author.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

  • The project is noted to have room for improvement in output quality, with a potential trade-off between audio quality and alignment accuracy.
  • Future extension to multi-speaker TTS is planned.
Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.