Discover and explore top open-source AI tools and projects—updated daily.
asr-pubHigh-quality speech synthesis via LoRA fine-tuning
Top 99.6% on SourcePulse
This project offers LoRA fine-tuning solutions for the index-tts high-quality speech synthesis model. It enables users to enhance prosody and naturalness for both single and multi-speaker voice generation, targeting researchers and developers seeking customized TTS capabilities.
How It Works
This project implements LoRA (Low-Rank Adaptation) fine-tuning on top of the index-tts speech synthesis model. The core workflow involves preparing custom datasets by extracting audio tokens and speaker conditioning vectors using provided Python scripts. These processed features, alongside speaker metadata, are then fed into the training script (train.py) to adapt the model. LoRA enables efficient adaptation by training only a small number of additional parameters, significantly reducing computational cost and memory requirements compared to full model fine-tuning, while aiming to enhance prosody and naturalness.
Quick Start & Requirements
python tools/extract_codec.py --audio_list ${audio_list} --extract_conditionpython train.pypython indextts/infer.pyaudio_list is audio_path + transcript (e.g., /path/to/audio.wav 小朋友们,大家好...). GPU acceleration is recommended.index-tts library. Specific Python version or other non-default system requirements are not detailed.Highlighted Details
Limitations & Caveats
Transcripts used for fine-tuning were automatically generated via ASR and punctuation models, potentially containing errors that could affect synthesis quality. The README does not specify supported operating systems or explicit hardware requirements beyond the implicit need for GPU.
1 month ago
Inactive
yl4579