diffwave  by lmnt-com

Neural vocoder and waveform synthesizer

Created 5 years ago
861 stars

Top 41.6% on SourcePulse

GitHubView on GitHub
Project Summary

DiffWave is a neural vocoder and waveform synthesizer that generates high-quality audio from conditioning signals like Mel spectrograms. It iteratively refines Gaussian noise into speech, offering fast inference and stable training. The project is suitable for researchers and developers working on speech synthesis and audio generation.

How It Works

DiffWave employs a diffusion model architecture, starting with random noise and progressively refining it through learned steps to produce audio. This approach allows for high-fidelity waveform generation, surpassing traditional methods in quality and offering a versatile framework for audio synthesis.

Quick Start & Requirements

  • Install via pip: pip install diffwave or from source.
  • Requires Python and PyTorch.
  • Pretrained models and audio samples are available.
  • See official documentation for detailed setup and training.

Highlighted Details

  • Achieves a real-time factor of 0.87 for speech synthesis using a pretrained model.
  • Supports fast sampling, stable training, high-quality synthesis, and mixed-precision/multi-GPU training.
  • Offers both command-line and programmatic inference APIs.
  • Includes unconditional waveform synthesis capabilities.

Maintenance & Community

The project has received contributions and pointers from the lead author of the DiffWave paper. The repository is active, with recent updates in late 2021.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The README does not specify compatibility with different operating systems or hardware beyond mentioning GPU training. The project's last update was in late 2021, so newer PyTorch versions or dependencies might require adjustments.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and Ajay Jain Ajay Jain(Cofounder of Genmo).

WaveGrad by ivanvovk

0%
403
PyTorch implementation of Google Brain's WaveGrad vocoder
Created 5 years ago
Updated 4 years ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
3 more.

tacotron2 by NVIDIA

0.0%
5k
PyTorch implementation for text-to-speech synthesis
Created 7 years ago
Updated 1 year ago
Feedback? Help us improve.