Discover and explore top open-source AI tools and projects—updated daily.
keonlee9420Conversational TTS dataset and baseline for dialogue synthesis
Top 100.0% on SourcePulse
Summary
DailyTalk introduces a high-quality spoken dialogue dataset and baseline code for conversational Text-to-Speech (TTS). It addresses the deficiency of conversational context in existing TTS datasets, enabling more natural and context-aware speech synthesis for researchers and developers.
How It Works
The dataset is derived from DailyDialog, enhanced through sampling, modification, and re-recording for improved speech quality. A non-autoregressive TTS model forms the baseline, uniquely conditioned on historical dialogue information. This approach allows the model to effectively capture and leverage conversational context, a key differentiator from utterance-centric TTS systems.
Quick Start & Requirements
pip3 install -r requirements.txt or via Dockerfile.output/ckpt/DailyTalk/), and unzip HiFi-GAN vocoder models. For multi-speaker training, a DeepSpeaker model may be required. Pre-extracted alignments from Montreal Forced Aligner (MFA) are provided or can be generated.python3 synthesize.py --source preprocessed_data/DailyTalk/val_*.txt --restore_step RESTORE_STEP --mode batch --dataset DailyTalk.python3 prepare_align.py, python3 preprocess.py) followed by training (python3 train.py).Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmaps were found in the provided README content.
Licensing & Compatibility
Licensed under Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA 4.0). This license permits commercial use and modification, provided that derivative works are shared under the same terms.
Limitations & Caveats
The system currently only supports batch inference due to its reliance on conversational history. Pretrained models may not have been trained using supervised duration modeling or external speaker embedders.
7 months ago
Inactive
yl4579
2noise