FireRedTTS  by FireRedTeam

LLM-empowered TTS system for research

created 11 months ago
753 stars

Top 47.1% on sourcepulse

GitHubView on GitHub
Project Summary

FireRedTTS-1S is an open-source, LLM-empowered Text-to-Speech system designed for researchers and developers working with advanced speech synthesis. It offers streamable foundation TTS capabilities, focusing on high-quality voice cloning and multilingual support.

How It Works

This system leverages a foundation TTS architecture, drawing inspiration from models like Tortoise-TTS and XTTS-v2. It incorporates advanced components such as BigCodec for speech compression and Encodec for causal convolutions, aiming for efficient and high-fidelity audio generation. The use of a Transformer-based approach, potentially enhanced by LLM embeddings, allows for nuanced prosody and natural-sounding speech.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment (conda create --name redtts python=3.10), install PyTorch with CUDA support (e.g., conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia), install FireRedTTS from source (pip install -e .), and then install other requirements (pip install -r requirements.txt).
  • Prerequisites: Python 3.10, PyTorch compatible with CUDA (11.8 or 12.1 supported), and downloaded model files from Model_Lists.
  • Resources: Requires downloading pre-trained models. Setup involves environment creation and package installation.
  • Docs: FireRedTTS-1S Paper, FireRedTTS-1S Demos

Highlighted Details

  • Supports zero-shot voice cloning.
  • Offers multilingual synthesis, with Chinese (zh) language support demonstrated.
  • Utilizes a 24kHz sampling rate.
  • Reference audio recommendations: 3-10 seconds, smooth, natural, with accurate accompanying text.

Maintenance & Community

The project released pre-trained checkpoints and inference code in April 2025 and a technical report in March 2025.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README.

Limitations & Caveats

The zero-shot voice cloning functionality is strictly for academic research purposes and must not be used for illegal activities. Developers disclaim liability for misuse.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
3
Star History
64 stars in the last 90 days

Explore Similar Projects

Starred by Tim J. Baek Tim J. Baek(Founder of Open WebUI), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

StyleTTS2 by yl4579

0.2%
6k
Text-to-speech model achieving human-level synthesis
created 2 years ago
updated 11 months ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

GPT-SoVITS by RVC-Boss

0.6%
49k
Few-shot voice cloning and TTS web UI
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.