Discover and explore top open-source AI tools and projects—updated daily.
TTS training framework for SpeechLM models
Top 62.9% on SourcePulse
This repository provides the training and modeling code for Inworld's SpeechLM-based Text-To-Speech (TTS) models, enabling users to pre-train, fine-tune, or align their own TTS models. It supports single or multi-GPU setups and is designed for researchers and developers working with advanced speech synthesis.
How It Works
The system leverages SpeechLM and 1D audio-codecs for TTS generation. It supports distributed training via DDP, DeepSpeed, and FSDP, offering flexibility for various hardware configurations. A robust data pipeline is included for preparing audio data into audio-codes, which are then used to condition the model for speech generation.
Quick Start & Requirements
make install
(with optional CUDA_VERSION
argument).uv
is recommended for package management.make install
command automates virtual environment creation, PyTorch installation with flash attention, and dependency setup.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The code is only tested on Ubuntu 22.04. Training requires significant computational resources and a prepared dataset in a specific JSONL format. Inference requires multiple model checkpoints (trained model, audio encoder, audio decoder).
1 week ago
Inactive