LLM-empowered TTS system for research
Top 47.1% on sourcepulse
FireRedTTS-1S is an open-source, LLM-empowered Text-to-Speech system designed for researchers and developers working with advanced speech synthesis. It offers streamable foundation TTS capabilities, focusing on high-quality voice cloning and multilingual support.
How It Works
This system leverages a foundation TTS architecture, drawing inspiration from models like Tortoise-TTS and XTTS-v2. It incorporates advanced components such as BigCodec for speech compression and Encodec for causal convolutions, aiming for efficient and high-fidelity audio generation. The use of a Transformer-based approach, potentially enhanced by LLM embeddings, allows for nuanced prosody and natural-sounding speech.
Quick Start & Requirements
conda create --name redtts python=3.10
), install PyTorch with CUDA support (e.g., conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia
), install FireRedTTS from source (pip install -e .
), and then install other requirements (pip install -r requirements.txt
).Model_Lists
.Highlighted Details
zh
) language support demonstrated.Maintenance & Community
The project released pre-trained checkpoints and inference code in April 2025 and a technical report in March 2025.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README.
Limitations & Caveats
The zero-shot voice cloning functionality is strictly for academic research purposes and must not be used for illegal activities. Developers disclaim liability for misuse.
2 months ago
1 week