WaveGrad by ivanvovk

PyTorch implementation of Google Brain's WaveGrad vocoder

Created 5 years ago

407 stars

Top 71.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Ajay Jain

Cofounder of Genmo

Project Summary

This repository provides a PyTorch implementation of Google Brain's WaveGrad, a high-fidelity vocoder for text-to-speech synthesis. It targets researchers and developers needing a fast, high-quality waveform generation model, offering competitive real-time factors and flexible configuration for custom datasets.

How It Works

WaveGrad is a conditional generative model based on Denoising Diffusion Probabilistic Models (DDPMs). It estimates gradients of the data density using a WaveNet-like sampling process, achieving high-fidelity audio generation with significantly faster convergence than traditional DDPMs, requiring as few as 6 iterations for competitive quality.

Quick Start & Requirements

Install via pip install -r requirements.txt.
Requires Python and PyTorch. Mixed-precision training is supported.
Training can run on a single 12GB GPU.
See generated_samples for examples.

Highlighted Details

Achieves real-time factor (RTF) of 0.04 on RTX 2080 Ti with 6-iteration inference.
Supports stable and fast training with mixed-precision and distributed training.
Offers CLI inference and flexible architecture configuration.
Includes a pretrained checkpoint on the LJSpeech dataset.

Maintenance & Community

Last updated October 2020 with significant feature additions.
No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

The project's last major update was in October 2020, and there's no indication of ongoing maintenance or active community support. The README notes that training can sometimes exhibit unstable loss behavior, requiring careful tuning of learning rate schedulers and gradient clipping.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days