Speech synthesis research paper using LLaMA
Top 55.5% on sourcepulse
LLaSA is a framework for speech synthesis that scales both training and inference compute for LLaMA-based models. It targets researchers and developers working on large-scale, multilingual text-to-speech systems, offering a unified approach to handle both text and speech tokens.
How It Works
LLaSA employs a unified tokenizer that combines text tokens from Llama models with specialized speech tokens derived from X-codec2. This approach allows for end-to-end training of speech synthesis models, enabling efficient scaling of compute resources for both training and inference.
Quick Start & Requirements
torchrun --nproc_per_node=8 train_tts.py config.json
or sbatch run_slurm.sh
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project relies on internal datasets not available for public release, which may limit reproducibility for users without access to similar proprietary data. The absence of a specified license raises concerns about commercial use.
3 months ago
1 week