StyleSpeech by KevinMIN95

Multi-speaker adaptive TTS generation

Created 4 years ago

252 stars

Top 99.6% on SourcePulse

Project Summary

Meta-StyleSpeech is a text-to-speech (TTS) system designed for multi-speaker adaptive speech generation, enabling high-quality, personalized voice synthesis from minimal reference audio. It targets researchers and developers in speech synthesis seeking efficient speaker adaptation without extensive fine-tuning.

How It Works

The core innovation is Style-Adaptive Layer Normalization (SALN), which aligns model parameters based on speaker style extracted from a reference audio sample. Meta-StyleSpeech further enhances adaptation through discriminators trained with style prototypes and episodic training, allowing for rapid learning of new speaker voices from very short (1-3 second) audio clips.

Quick Start & Requirements

Install: Clone the repository and install Python requirements from requirements.txt.
Prerequisites: Requires pre-trained models (links provided in README) and a reference speech audio sample. Montreal Forced Aligner (MFA) is used for dataset preprocessing.
Inference: python synthesize.py --text <raw text> --ref_audio <path to reference audio> --checkpoint_path <path to pretrained model>
Dataset: Trained on LibriTTS. Preprocessing involves resampling, forced alignment with MFA, and generating mel-spectrograms, durations, pitch, and energy.
Links: [Demo audio samples](demo page)

Highlighted Details

Achieves high-quality speech synthesis accurately following a target speaker's voice.
Adapts to new speakers using only single, short-duration (1-3 sec) speech audio samples.
Outperforms baseline methods in speaker adaptation quality.
Offers official implementations for both StyleSpeech and Meta-StyleSpeech.

Maintenance & Community

The project was last updated in December 2021. No community links (Discord, Slack) or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. It references other projects, implying potential licensing considerations from those dependencies. Commercial use compatibility is not specified.

Limitations & Caveats

The project's last update was in late 2021, suggesting potential for unaddressed issues or lack of ongoing development. No explicit information is provided regarding compatibility with newer Python versions or hardware accelerators beyond what might be inferred from dependencies.

StyleSpeech by KevinMIN95

Explore Similar Projects

ControlSpeech by jishengpeng

index-tts-lora by asr-pub

pheme by PolyAI-LDN

SpeechGPT-2.0-preview by OpenMOSS

GenerSpeech by Rongjiehuang

StyleTTS by yl4579

parler-tts by huggingface

higgs-audio by boson-ai

StyleTTS2 by yl4579

metavoice-src by metavoiceio

whisper-vits-svc by PlayVoice

Real-Time-Voice-Cloning by CorentinJ