Discover and explore top open-source AI tools and projects—updated daily.
Deep learning approaches for emotional text-to-speech
Top 66.5% on SourcePulse
This repository explores deep learning approaches for emotional Text-to-Speech (TTS) synthesis, targeting researchers and practitioners in speech synthesis. It details experimental findings from fine-tuning Tacotron and DC-TTS models on emotional speech datasets, offering insights into effective strategies for low-resource emotional TTS.
How It Works
The project investigates fine-tuning pre-trained Tacotron and DC-TTS models on emotional speech datasets like RAVDESS and EMOV-DB. Key strategies include adjusting learning rates, switching optimizers (Adam to SGD), freezing specific model components (encoder, postnet), and using single-speaker data per emotion. These methods aim to mitigate "catastrophic forgetting" and improve emotional expressiveness in synthesized speech.
Quick Start & Requirements
r9y9/tacotron
and tugstugi/dc-tts
are used.Highlighted Details
top_db=20
, and monotonic_attention=True
, successfully generated "Anger" with good quality.Maintenance & Community
The project was released in June 2020 by a team of authors from IIIT Delhi. Contact information for project members is provided for support.
Licensing & Compatibility
The repository is licensed under the MIT License, allowing for commercial use and modification.
Limitations & Caveats
1 year ago
Inactive