GenerSpeech by Rongjiehuang

Text-to-speech model for zero-shot style transfer of custom voice

Created 3 years ago

329 stars

Top 83.2% on SourcePulse

Project Summary

GenerSpeech is a PyTorch implementation of a text-to-speech (TTS) model designed for zero-shot style transfer of out-of-domain custom voices. It targets researchers and developers working on expressive and customizable speech synthesis, enabling high-fidelity audio generation with novel voice characteristics.

How It Works

GenerSpeech employs a multi-level style transfer approach, enhancing its generalization capabilities to unseen voice styles. It leverages an acoustic model, a neural vocoder (HIFI-GAN), and an emotion encoder to achieve expressive and high-fidelity speech synthesis. This architecture allows for zero-shot style transfer by conditioning the speech generation on a reference audio sample.

Quick Start & Requirements

Install: Create and activate a conda environment using conda env create -f environment.yaml and conda activate generspeech.
Prerequisites: NVIDIA GPU with CUDA and cuDNN.
Pretrained Models: Available for acoustic model, HIFI-GAN vocoder, and emotion encoder.
Demo: Available at https://github.com/Rongjiehuang/GenerSpeech (link to demo page not directly provided in README, but implied).

Highlighted Details

Zero-shot style transfer for out-of-domain custom voices.
Multi-level style transfer for expressive TTS.
Enhanced model generalization to OOD style references.
PyTorch implementation of NeurIPS'22 paper.

Maintenance & Community

Project released in December 2022.
Codebase incorporates elements from FastDiff and NATSpeech.
Citation details provided for academic use.

Licensing & Compatibility

License type not explicitly stated in the README.
Disclaimer prohibits generating speech without consent, potentially impacting commercial use or integration with closed-source applications if licensing is restrictive.

Limitations & Caveats

The README includes a disclaimer against unauthorized speech generation, which may impose ethical or legal constraints on usage. Specific licensing details for commercial use are not provided.

GenerSpeech by Rongjiehuang

Explore Similar Projects

SpeechGPT-2.0-preview by OpenMOSS

Meta-voicebox by SpeechifyInc

StyleSpeech by KevinMIN95

StyleTTS by yl4579

FireRedTTS by FireRedTeam

speech-synthesis-paper by wenet-e2e

parler-tts by huggingface

higgs-audio by boson-ai

StyleTTS2 by yl4579

metavoice-src by metavoiceio

Zonos by Zyphra

GPT-SoVITS by RVC-Boss