FireRedTTS  by FireRedTeam

LLM-empowered TTS system for research

Created 1 year ago
804 stars

Top 43.9% on SourcePulse

GitHubView on GitHub
Project Summary

FireRedTTS-1S is an open-source, LLM-empowered Text-to-Speech system designed for researchers and developers working with advanced speech synthesis. It offers streamable foundation TTS capabilities, focusing on high-quality voice cloning and multilingual support.

How It Works

This system leverages a foundation TTS architecture, drawing inspiration from models like Tortoise-TTS and XTTS-v2. It incorporates advanced components such as BigCodec for speech compression and Encodec for causal convolutions, aiming for efficient and high-fidelity audio generation. The use of a Transformer-based approach, potentially enhanced by LLM embeddings, allows for nuanced prosody and natural-sounding speech.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment (conda create --name redtts python=3.10), install PyTorch with CUDA support (e.g., conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia), install FireRedTTS from source (pip install -e .), and then install other requirements (pip install -r requirements.txt).
  • Prerequisites: Python 3.10, PyTorch compatible with CUDA (11.8 or 12.1 supported), and downloaded model files from Model_Lists.
  • Resources: Requires downloading pre-trained models. Setup involves environment creation and package installation.
  • Docs: FireRedTTS-1S Paper, FireRedTTS-1S Demos

Highlighted Details

  • Supports zero-shot voice cloning.
  • Offers multilingual synthesis, with Chinese (zh) language support demonstrated.
  • Utilizes a 24kHz sampling rate.
  • Reference audio recommendations: 3-10 seconds, smooth, natural, with accurate accompanying text.

Maintenance & Community

The project released pre-trained checkpoints and inference code in April 2025 and a technical report in March 2025.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README.

Limitations & Caveats

The zero-shot voice cloning functionality is strictly for academic research purposes and must not be used for illegal activities. Developers disclaim liability for misuse.

Health Check
Last Commit

2 days ago

Responsiveness

1 week

Pull Requests (30d)
2
Issues (30d)
1
Star History
53 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
2 more.

metavoice-src by metavoiceio

0.1%
4k
TTS model for human-like, expressive speech
Created 1 year ago
Updated 1 year ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
51k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.