FireRedTTS by FireRedTeam

LLM-empowered TTS system for research

Created 1 year ago

902 stars

Top 40.1% on SourcePulse

Project Summary

FireRedTTS-1S is an open-source, LLM-empowered Text-to-Speech system designed for researchers and developers working with advanced speech synthesis. It offers streamable foundation TTS capabilities, focusing on high-quality voice cloning and multilingual support.

How It Works

This system leverages a foundation TTS architecture, drawing inspiration from models like Tortoise-TTS and XTTS-v2. It incorporates advanced components such as BigCodec for speech compression and Encodec for causal convolutions, aiming for efficient and high-fidelity audio generation. The use of a Transformer-based approach, potentially enhanced by LLM embeddings, allows for nuanced prosody and natural-sounding speech.

Quick Start & Requirements

Install: Clone the repository, create a conda environment (conda create --name redtts python=3.10), install PyTorch with CUDA support (e.g., conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia), install FireRedTTS from source (pip install -e .), and then install other requirements (pip install -r requirements.txt).
Prerequisites: Python 3.10, PyTorch compatible with CUDA (11.8 or 12.1 supported), and downloaded model files from Model_Lists.
Resources: Requires downloading pre-trained models. Setup involves environment creation and package installation.
Docs: FireRedTTS-1S Paper, FireRedTTS-1S Demos

Highlighted Details

Supports zero-shot voice cloning.
Offers multilingual synthesis, with Chinese (zh) language support demonstrated.
Utilizes a 24kHz sampling rate.
Reference audio recommendations: 3-10 seconds, smooth, natural, with accurate accompanying text.

Maintenance & Community

The project released pre-trained checkpoints and inference code in April 2025 and a technical report in March 2025.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README.

Limitations & Caveats

The zero-shot voice cloning functionality is strictly for academic research purposes and must not be used for illegal activities. Developers disclaim liability for misuse.

FireRedTTS by FireRedTeam

Explore Similar Projects

ControlSpeech by jishengpeng

SpeechGPT-2.0-preview by OpenMOSS

speech-synthesis-paper by wenet-e2e

MARS5-TTS by Camb-ai

WhisperSpeech by WhisperSpeech

Step-Audio by stepfun-ai

KittenTTS by KittenML

metavoice-src by metavoiceio

VITS-fast-fine-tuning by Plachtaa

Zonos by Zyphra

Spark-TTS by SparkAudio

GPT-SoVITS by RVC-Boss