FireRedTTS  by FireRedTeam

LLM-empowered TTS system for research

Created 1 year ago
892 stars

Top 40.5% on SourcePulse

GitHubView on GitHub
Project Summary

FireRedTTS-1S is an open-source, LLM-empowered Text-to-Speech system designed for researchers and developers working with advanced speech synthesis. It offers streamable foundation TTS capabilities, focusing on high-quality voice cloning and multilingual support.

How It Works

This system leverages a foundation TTS architecture, drawing inspiration from models like Tortoise-TTS and XTTS-v2. It incorporates advanced components such as BigCodec for speech compression and Encodec for causal convolutions, aiming for efficient and high-fidelity audio generation. The use of a Transformer-based approach, potentially enhanced by LLM embeddings, allows for nuanced prosody and natural-sounding speech.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment (conda create --name redtts python=3.10), install PyTorch with CUDA support (e.g., conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia), install FireRedTTS from source (pip install -e .), and then install other requirements (pip install -r requirements.txt).
  • Prerequisites: Python 3.10, PyTorch compatible with CUDA (11.8 or 12.1 supported), and downloaded model files from Model_Lists.
  • Resources: Requires downloading pre-trained models. Setup involves environment creation and package installation.
  • Docs: FireRedTTS-1S Paper, FireRedTTS-1S Demos

Highlighted Details

  • Supports zero-shot voice cloning.
  • Offers multilingual synthesis, with Chinese (zh) language support demonstrated.
  • Utilizes a 24kHz sampling rate.
  • Reference audio recommendations: 3-10 seconds, smooth, natural, with accurate accompanying text.

Maintenance & Community

The project released pre-trained checkpoints and inference code in April 2025 and a technical report in March 2025.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README.

Limitations & Caveats

The zero-shot voice cloning functionality is strictly for academic research purposes and must not be used for illegal activities. Developers disclaim liability for misuse.

Health Check
Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
2 more.

metavoice-src by metavoiceio

0.0%
4k
TTS model for human-like, expressive speech
Created 1 year ago
Updated 1 year ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.4%
54k
Few-shot voice cloning and TTS web UI
Created 2 years ago
Updated 1 week ago
Feedback? Help us improve.