wavenet_vocoder by r9y9

WaveNet vocoder for high-quality raw speech sample generation

Created 8 years ago

2,372 stars

Top 19.0% on SourcePulse

View on GitHub

3 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Ajay Jain

Cofounder of Genmo

Jong Wook Kim

Research Scientist at OpenAI

Project Summary

This repository provides an implementation of the WaveNet vocoder for generating high-quality raw speech samples conditioned on linguistic or acoustic features. It is targeted at researchers and developers working on text-to-speech (TTS) systems who need a flexible and efficient vocoder. The primary benefit is the ability to synthesize speech with high fidelity, leveraging advanced conditioning techniques and distribution modeling.

How It Works

The WaveNet vocoder models 16-bit raw audio using mixture distributions, including Mixture of Logistics (MoL), Mixture of Gaussians, and single Gaussian distributions. This approach allows for precise modeling of the audio waveform. It emphasizes local and global conditioning, crucial for vocoder performance, and employs fast inference through caching intermediate states in convolutions, similar to the Parallel WaveNet paper.

Quick Start & Requirements

Install: pip install -e . or pip install wavenet_vocoder for library-only use.
Requirements: Python 3, CUDA >= 8.0, PyTorch >= v0.4.0.
Usage: The repository includes ESPnet-style recipes for data preprocessing, training, and inference. Pre-trained models for LJSpeech and CMU ARCTIC are available.
Docs: https://r9y9.github.io/wavenet_vocoder/

Highlighted Details

Supports 16-bit raw audio modeling with various mixture distributions (MoL, MoG, Gaussian).
Offers fast inference via convolutional state caching.
Integrates with the ESPnet toolkit for end-to-end TTS systems.
Provides pre-trained models and Kaldi-style recipes for reproducibility.

Maintenance & Community

The project is actively maintained by r9y9. Integration with ESPnet suggests a connection to a broader research community. Further community interaction details are not explicitly listed in the README.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

This is noted as a development version; a stable version is available at v0.1.1. Global conditioning for multi-speaker WaveNet is not supported in the current recipes but is mentioned as a potential implementation. Some command-line tools like synthesis.py are noted as potentially not working, with evaluate.py recommended as an alternative.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days