ParallelWaveGAN by kan-bayashi

Pytorch vocoder for real-time speech synthesis, based on Parallel WaveGAN

Created 6 years ago

1,637 stars

Top 25.5% on SourcePulse

View on GitHub

3 Experts Love This Project

Omar Sanseviero

DevRel at Google DeepMind

Casper Hansen

Author of AutoAWQ

Jong Wook Kim

Research Scientist at OpenAI

Project Summary

This repository provides unofficial PyTorch implementations of state-of-the-art non-autoregressive neural vocoders, including Parallel WaveGAN, MelGAN, Multi-band MelGAN, HiFi-GAN, and StyleMelGAN. It aims to enable real-time neural vocoding for text-to-speech and singing voice synthesis, offering compatibility with ESPnet-TTS and other Tacotron2-based implementations.

How It Works

The project implements various GAN-based vocoder architectures that generate audio waveforms from mel-spectrograms. These models leverage techniques like multi-band processing and adversarial training to achieve high-fidelity audio synthesis at fast inference speeds. The non-autoregressive nature of these models is key to their real-time performance.

Quick Start & Requirements

Install: pip install -e . (after git clone) or via make in the tools directory.
Prerequisites: Python 3.8+, CUDA 11.0+, CuDNN 8+, NCCL 2+, libsndfile, jq, sox. Tested with PyTorch 1.8.1 to 2.1.0.
Setup: Installation via pip is straightforward. Training recipes are provided, similar to ESPnet.
Docs: ESPnet2 Demo, ESPnet1 Demo, Muskits Demo

Highlighted Details

Supports multiple languages (English, Japanese, Mandarin, Korean) and singing voice synthesis.
Achieves very fast inference speeds, with RTF as low as 0.001 on GPU.
Offers numerous pre-trained models for various datasets and architectures.
Provides detailed recipes and examples for integration with ESPnet-TTS.

Maintenance & Community

The repository is maintained by Tomoki Hayashi (@kan-bayashi). Updates include new recipes and support for singing voice vocoders.

Licensing & Compatibility

The license of pre-trained models depends on the corpus used for training. Some codes are derived from ESPnet/Kaldi (Apache-2.0). Users must verify dataset licenses for commercial use.

Limitations & Caveats

The repository is unofficial. Users are responsible for checking dataset licenses for commercial use and potential legal disputes. The README notes that the terms of use of pre-trained models follow those of the respective training corpora.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days