wavegrad  by lmnt-com

Neural vocoder for high-quality waveform generation from spectrograms

created 4 years ago
288 stars

Top 92.1% on sourcepulse

GitHubView on GitHub
Project Summary

WaveGrad is a neural vocoder that converts Mel spectrograms into high-quality audio waveforms through iterative refinement. It is designed for fast, high-fidelity speech synthesis, targeting researchers and developers in audio processing and speech synthesis.

How It Works

WaveGrad employs a diffusion model approach, specifically estimating gradients for waveform generation. It iteratively refines a noise signal into a waveform by applying learned denoising steps guided by the input Mel spectrogram. This method allows for high-quality synthesis and offers flexibility in inference speed by adjusting the noise schedule.

Quick Start & Requirements

  • Install via pip: pip install wavegrad or from source.
  • Requires Python and a GPU for efficient training and inference.
  • Training requires a dataset of 16-bit mono WAV files.
  • Preprocessed data and trained models are available.
  • Official documentation and audio samples are linked in the README.

Highlighted Details

  • Achieves high-quality synthesis with a diffusion model architecture.
  • Supports custom noise schedules for faster-than-real-time inference (as few as 6 iterations).
  • Includes command-line and programmatic inference APIs.
  • Supports mixed-precision and multi-GPU training.

Maintenance & Community

The project originated from Google Brain. The repository is hosted by lmnt-com. No specific community channels or roadmap are detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the origin (Google Brain) and lack of explicit mention, users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The project was last updated in 2020. While marked as stable for training and synthesis, the lack of recent activity may indicate limited ongoing development or support. Finding optimal custom noise schedules may require additional effort.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Tim J. Baek Tim J. Baek(Founder of Open WebUI), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

StyleTTS2 by yl4579

0.2%
6k
Text-to-speech model achieving human-level synthesis
created 2 years ago
updated 11 months ago
Feedback? Help us improve.