PyTorch implementation of Google Brain's WaveGrad vocoder
Top 73.0% on sourcepulse
This repository provides a PyTorch implementation of Google Brain's WaveGrad, a high-fidelity vocoder for text-to-speech synthesis. It targets researchers and developers needing a fast, high-quality waveform generation model, offering competitive real-time factors and flexible configuration for custom datasets.
How It Works
WaveGrad is a conditional generative model based on Denoising Diffusion Probabilistic Models (DDPMs). It estimates gradients of the data density using a WaveNet-like sampling process, achieving high-fidelity audio generation with significantly faster convergence than traditional DDPMs, requiring as few as 6 iterations for competitive quality.
Quick Start & Requirements
pip install -r requirements.txt
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project's last major update was in October 2020, and there's no indication of ongoing maintenance or active community support. The README notes that training can sometimes exhibit unstable loss behavior, requiring careful tuning of learning rate schedulers and gradient clipping.
4 years ago
Inactive