Discover and explore top open-source AI tools and projects—updated daily.
huggingfaceSuite of scripts for tagging speech datasets, especially for TTS model development
Top 75.1% on SourcePulse
Data-Speech is a Python utility suite for annotating speech datasets with natural language descriptions of speaker characteristics, primarily for training text-to-speech (TTS) models like Parler-TTS. It targets researchers and developers in the speech AI domain, enabling reproducible and detailed dataset conditioning.
How It Works
The core process involves three stages: 1) predicting continuous audio features (e.g., speaking rate, SNR, reverberation) using main.py, 2) mapping these continuous features to discrete text bins (e.g., "slightly slowly," "very monotone") via metadata_to_text.py, and 3) generating natural language descriptions from these bins using LLMs via run_prompt_creation.py or run_prompt_creation_llm_swarm.py. This approach allows for fine-grained control over TTS model conditioning.
Quick Start & Requirements
pip install -r requirements.txt after cloning the repository.datasets library. GPU acceleration is highly recommended for prompt creation and some feature extraction.Highlighted Details
Maintenance & Community
datasets, transformers, and accelerate.Licensing & Compatibility
datasets and transformers ecosystem.Limitations & Caveats
1 year ago
Inactive
PaddlePaddle