wavegrad  by lmnt-com

Neural vocoder for high-quality waveform generation from spectrograms

Created 5 years ago
290 stars

Top 90.8% on SourcePulse

GitHubView on GitHub
Project Summary

WaveGrad is a neural vocoder that converts Mel spectrograms into high-quality audio waveforms through iterative refinement. It is designed for fast, high-fidelity speech synthesis, targeting researchers and developers in audio processing and speech synthesis.

How It Works

WaveGrad employs a diffusion model approach, specifically estimating gradients for waveform generation. It iteratively refines a noise signal into a waveform by applying learned denoising steps guided by the input Mel spectrogram. This method allows for high-quality synthesis and offers flexibility in inference speed by adjusting the noise schedule.

Quick Start & Requirements

  • Install via pip: pip install wavegrad or from source.
  • Requires Python and a GPU for efficient training and inference.
  • Training requires a dataset of 16-bit mono WAV files.
  • Preprocessed data and trained models are available.
  • Official documentation and audio samples are linked in the README.

Highlighted Details

  • Achieves high-quality synthesis with a diffusion model architecture.
  • Supports custom noise schedules for faster-than-real-time inference (as few as 6 iterations).
  • Includes command-line and programmatic inference APIs.
  • Supports mixed-precision and multi-GPU training.

Maintenance & Community

The project originated from Google Brain. The repository is hosted by lmnt-com. No specific community channels or roadmap are detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the origin (Google Brain) and lack of explicit mention, users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The project was last updated in 2020. While marked as stable for training and synthesis, the lack of recent activity may indicate limited ongoing development or support. Finding optimal custom noise schedules may require additional effort.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and Ajay Jain Ajay Jain(Cofounder of Genmo).

WaveGrad by ivanvovk

0%
403
PyTorch implementation of Google Brain's WaveGrad vocoder
Created 5 years ago
Updated 4 years ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
3 more.

tacotron2 by NVIDIA

0.0%
5k
PyTorch implementation for text-to-speech synthesis
Created 7 years ago
Updated 1 year ago
Feedback? Help us improve.