PyTorch implementation of Natural Speech 2, a zero-shot speech/singing synthesizer
Top 31.0% on sourcepulse
This repository provides a PyTorch implementation of NaturalSpeech 2, a zero-shot speech and singing synthesizer. It targets ML/AI engineers and researchers in the TTS field, offering a novel approach to text-to-speech synthesis using a neural audio codec and a latent diffusion model for non-autoregressive generation, enabling natural and expressive speech.
How It Works
The system leverages a latent diffusion model operating on continuous latent vectors from a neural audio codec (Encodec). This approach allows for non-autoregressive generation of speech, contributing to naturalness and efficiency. The implementation focuses on denoising diffusion and incorporates improvements to transformer components, aiming for state-of-the-art performance.
Quick Start & Requirements
pip install naturalspeech2-pytorch
.cuda()
calls), naturalspeech2-pytorch
library.Trainer
class are provided in the README.Highlighted Details
Trainer
class for simplified training and sampling loops.Maintenance & Community
accelerate
library.Licensing & Compatibility
Limitations & Caveats
1 year ago
1 day