WaveGrad  by ivanvovk

PyTorch implementation of Google Brain's WaveGrad vocoder

created 4 years ago
403 stars

Top 73.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of Google Brain's WaveGrad, a high-fidelity vocoder for text-to-speech synthesis. It targets researchers and developers needing a fast, high-quality waveform generation model, offering competitive real-time factors and flexible configuration for custom datasets.

How It Works

WaveGrad is a conditional generative model based on Denoising Diffusion Probabilistic Models (DDPMs). It estimates gradients of the data density using a WaveNet-like sampling process, achieving high-fidelity audio generation with significantly faster convergence than traditional DDPMs, requiring as few as 6 iterations for competitive quality.

Quick Start & Requirements

  • Install via pip install -r requirements.txt.
  • Requires Python and PyTorch. Mixed-precision training is supported.
  • Training can run on a single 12GB GPU.
  • See generated_samples for examples.

Highlighted Details

  • Achieves real-time factor (RTF) of 0.04 on RTX 2080 Ti with 6-iteration inference.
  • Supports stable and fast training with mixed-precision and distributed training.
  • Offers CLI inference and flexible architecture configuration.
  • Includes a pretrained checkpoint on the LJSpeech dataset.

Maintenance & Community

  • Last updated October 2020 with significant feature additions.
  • No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

The project's last major update was in October 2020, and there's no indication of ongoing maintenance or active community support. The README notes that training can sometimes exhibit unstable loss behavior, requiring careful tuning of learning rate schedulers and gradient clipping.

Health Check
Last commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
2 more.

tacotron2 by NVIDIA

0.0%
5k
PyTorch implementation for text-to-speech synthesis
created 7 years ago
updated 1 year ago
Feedback? Help us improve.