Discover and explore top open-source AI tools and projects—updated daily.
keonlee9420Conversational TTS dataset and baseline for dialogue synthesis
Top 99.8% on SourcePulse
Summary
DailyTalk introduces a high-quality spoken dialogue dataset and baseline code for conversational Text-to-Speech (TTS). It addresses the deficiency of conversational context in existing TTS datasets, enabling more natural and context-aware speech synthesis for researchers and developers.
How It Works
The dataset is derived from DailyDialog, enhanced through sampling, modification, and re-recording for improved speech quality. A non-autoregressive TTS model forms the baseline, uniquely conditioned on historical dialogue information. This approach allows the model to effectively capture and leverage conversational context, a key differentiator from utterance-centric TTS systems.
Quick Start & Requirements
pip3 install -r requirements.txt or via Dockerfile.output/ckpt/DailyTalk/), and unzip HiFi-GAN vocoder models. For multi-speaker training, a DeepSpeaker model may be required. Pre-extracted alignments from Montreal Forced Aligner (MFA) are provided or can be generated.python3 synthesize.py --source preprocessed_data/DailyTalk/val_*.txt --restore_step RESTORE_STEP --mode batch --dataset DailyTalk.python3 prepare_align.py, python3 preprocess.py) followed by training (python3 train.py).Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmaps were found in the provided README content.
Licensing & Compatibility
Licensed under Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA 4.0). This license permits commercial use and modification, provided that derivative works are shared under the same terms.
Limitations & Caveats
The system currently only supports batch inference due to its reliance on conversational history. Pretrained models may not have been trained using supervised duration modeling or external speaker embedders.
10 months ago
Inactive
yl4579
2noise