PyTorch for universal neural vocoding via large-scale training
Top 35.8% on sourcepulse
BigVGAN is an official PyTorch implementation of a universal neural vocoder designed for high-fidelity audio synthesis. It targets researchers and developers in speech synthesis (TTS) and audio generation, offering significant improvements in audio quality and inference speed over previous models.
How It Works
BigVGAN employs a large-scale training approach with a multi-scale sub-band CQT discriminator and a multi-scale mel spectrogram loss. A key innovation is a custom fused CUDA kernel for anti-aliased activation (upsampling + activation + downsampling), which accelerates inference by 1.5-3x on an A100 GPU compared to standard PyTorch operations.
Quick Start & Requirements
pip install -r requirements.txt
. A Conda environment setup is provided.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
clip_grad_norm
to avoid early divergence.nvcc
and PyTorch versions; failures indicate potential issues.11 months ago
Inactive