Discover and explore top open-source AI tools and projects—updated daily.
inworld-aiTTS training framework for SpeechLM models
Top 54.2% on SourcePulse
This repository provides the training and modeling code for Inworld's SpeechLM-based Text-To-Speech (TTS) models, enabling users to pre-train, fine-tune, or align their own TTS models. It supports single or multi-GPU setups and is designed for researchers and developers working with advanced speech synthesis.
How It Works
The system leverages SpeechLM and 1D audio-codecs for TTS generation. It supports distributed training via DDP, DeepSpeed, and FSDP, offering flexibility for various hardware configurations. A robust data pipeline is included for preparing audio data into audio-codes, which are then used to condition the model for speech generation.
Quick Start & Requirements
make install (with optional CUDA_VERSION argument).uv is recommended for package management.make install command automates virtual environment creation, PyTorch installation with flash attention, and dependency setup.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The code is only tested on Ubuntu 22.04. Training requires significant computational resources and a prepared dataset in a specific JSONL format. Inference requires multiple model checkpoints (trained model, audio encoder, audio decoder).
3 months ago
Inactive
lucidrains
fixie-ai