univnet  by maum-ai

PyTorch implementation of UnivNet vocoder for high-fidelity waveform generation

created 4 years ago
274 stars

Top 95.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

UnivNet provides an unofficial PyTorch implementation of the UnivNet neural vocoder, designed for high-fidelity audio waveform generation. It targets researchers and developers working on text-to-speech (TTS) systems who require a fast and accurate vocoder, claiming superior objective and subjective performance over HiFi-GAN.

How It Works

UnivNet employs a multi-resolution spectrogram discriminator architecture, a key innovation that allows it to capture audio details across different frequency scales. This approach, combined with a GAN framework, enables high-fidelity waveform synthesis. The implementation leverages the same mel-spectrogram calculation as HiFi-GAN for compatibility with popular TTS models like Tacotron2.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.6, PyTorch 1.6.0, NumPy 1.17.4, SciPy 1.5.4. Requires audio data sampled at 24,000Hz.
  • Data: LibriTTS dataset is recommended. Metadata format: path_to_wav|transcript|speaker_id.
  • Configuration: Copy config/default_c32.yaml to config/config.yaml and update data paths.
  • Training: python trainer.py -c CONFIG_YAML_FILE -n NAME_OF_THE_RUN
  • Inference: python inference.py -p CHECKPOINT_PATH -i INPUT_MEL_PATH -o OUTPUT_WAV_PATH
  • Pre-trained Models: Available via Google Drive links in the README.

Highlighted Details

  • Claims 1.5x faster inference speed than HiFi-GAN.
  • Achieves superior objective scores (PESQ, RMSE) compared to official UnivNet and HiFi-GAN.
  • Supports both UnivNet-c16 and UnivNet-c32 configurations.
  • Compatible with NVIDIA/tacotron2 mel-spectrogram calculations.

Maintenance & Community

  • Developed by MINDsLab Inc. with contributors listed.
  • No explicit links to community channels (Discord/Slack) or roadmaps provided in the README.

Licensing & Compatibility

  • Licensed under BSD 3-Clause License.
  • Compatible with commercial use and closed-source linking due to the permissive BSD license.

Limitations & Caveats

The hop_length parameter for mel-spectrogram calculation is fixed at 256 and cannot be changed. The implementation is noted as unofficial.

Health Check
Last commit

3 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.