Suite of scripts for tagging speech datasets, especially for TTS model development
Top 77.1% on sourcepulse
Data-Speech is a Python utility suite for annotating speech datasets with natural language descriptions of speaker characteristics, primarily for training text-to-speech (TTS) models like Parler-TTS. It targets researchers and developers in the speech AI domain, enabling reproducible and detailed dataset conditioning.
How It Works
The core process involves three stages: 1) predicting continuous audio features (e.g., speaking rate, SNR, reverberation) using main.py
, 2) mapping these continuous features to discrete text bins (e.g., "slightly slowly," "very monotone") via metadata_to_text.py
, and 3) generating natural language descriptions from these bins using LLMs via run_prompt_creation.py
or run_prompt_creation_llm_swarm.py
. This approach allows for fine-grained control over TTS model conditioning.
Quick Start & Requirements
pip install -r requirements.txt
after cloning the repository.datasets
library. GPU acceleration is highly recommended for prompt creation and some feature extraction.Highlighted Details
Maintenance & Community
datasets
, transformers
, and accelerate
.Licensing & Compatibility
datasets
and transformers
ecosystem.Limitations & Caveats
11 months ago
1 week