PyTorch implementation of MelGAN vocoder
Top 52.5% on sourcepulse
MelGAN is a PyTorch implementation of a fast and efficient neural vocoder, designed to convert mel-spectrograms into high-fidelity audio waveforms. It offers a lighter and faster alternative to models like WaveGlow, with improved generalization for unseen speakers. This implementation is compatible with NVIDIA's Tacotron2, allowing direct conversion of its mel-spectrogram outputs to audio.
How It Works
MelGAN employs a Generative Adversarial Network (GAN) architecture. It utilizes a generator network to synthesize audio from mel-spectrograms and a discriminator network to distinguish between real and generated audio. The key advantage lies in its efficient generator design, which enables faster inference and lower computational requirements compared to autoregressive models, while maintaining high audio quality.
Quick Start & Requirements
pip install -r requirements.txt
preprocess.py
.python trainer.py -c [config yaml file] -n [name of the run]
python inference.py -p [checkpoint path] -i [input mel path]
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The repository notes that the utils/hparams.py
file is sourced from a project with an unspecified license, which may pose compatibility issues for commercial use. A Google Colab demo is marked as "TODO".
4 years ago
Inactive