PyTorch implementation of UnivNet vocoder for high-fidelity waveform generation
Top 95.2% on sourcepulse
UnivNet provides an unofficial PyTorch implementation of the UnivNet neural vocoder, designed for high-fidelity audio waveform generation. It targets researchers and developers working on text-to-speech (TTS) systems who require a fast and accurate vocoder, claiming superior objective and subjective performance over HiFi-GAN.
How It Works
UnivNet employs a multi-resolution spectrogram discriminator architecture, a key innovation that allows it to capture audio details across different frequency scales. This approach, combined with a GAN framework, enables high-fidelity waveform synthesis. The implementation leverages the same mel-spectrogram calculation as HiFi-GAN for compatibility with popular TTS models like Tacotron2.
Quick Start & Requirements
pip install -r requirements.txt
path_to_wav|transcript|speaker_id
.config/default_c32.yaml
to config/config.yaml
and update data paths.python trainer.py -c CONFIG_YAML_FILE -n NAME_OF_THE_RUN
python inference.py -p CHECKPOINT_PATH -i INPUT_MEL_PATH -o OUTPUT_WAV_PATH
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The hop_length
parameter for mel-spectrogram calculation is fixed at 256 and cannot be changed. The implementation is noted as unofficial.
3 years ago
1 week