Text-to-speech model for zero-shot style transfer of custom voice
Top 84.6% on sourcepulse
GenerSpeech is a PyTorch implementation of a text-to-speech (TTS) model designed for zero-shot style transfer of out-of-domain custom voices. It targets researchers and developers working on expressive and customizable speech synthesis, enabling high-fidelity audio generation with novel voice characteristics.
How It Works
GenerSpeech employs a multi-level style transfer approach, enhancing its generalization capabilities to unseen voice styles. It leverages an acoustic model, a neural vocoder (HIFI-GAN), and an emotion encoder to achieve expressive and high-fidelity speech synthesis. This architecture allows for zero-shot style transfer by conditioning the speech generation on a reference audio sample.
Quick Start & Requirements
conda env create -f environment.yaml
and conda activate generspeech
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README includes a disclaimer against unauthorized speech generation, which may impose ethical or legal constraints on usage. Specific licensing details for commercial use are not provided.
1 year ago
Inactive