index-tts by index-tts

Zero-shot TTS system for industrial use

Created 1 year ago

18,934 stars

Top 2.5% on SourcePulse

1 Expert Loves This Project

wsxiaoys

Cofounder of TabbyML

Project Summary

IndexTTS is an industrial-level zero-shot text-to-speech system designed for high-quality, controllable voice synthesis, particularly excelling in Chinese language scenarios. It targets researchers and developers seeking advanced TTS capabilities, offering state-of-the-art performance and features like pronunciation correction and precise pause control.

How It Works

IndexTTS builds upon XTTS and Tortoise, integrating a conformer conditioning encoder and a BigVGAN2-based speechcode decoder. This architecture enhances training stability, speaker similarity, and audio quality. A key innovation is its character-pinyin hybrid modeling for accurate Chinese pronunciation, alongside punctuation-based pause control.

Quick Start & Requirements

Install: pip install -r requirements.txt and pip install -e . for CLI usage.
Prerequisites: Python 3.10, ffmpeg, PyTorch. Model weights must be downloaded to a checkpoints directory.
Setup: Requires downloading model checkpoints (approx. 1.5GB).
Demos: HuggingFace (link), ModelScope (link).
Web UI: pip install -e ".[webui]" && python webui.py

Highlighted Details

Achieves state-of-the-art performance, outperforming popular TTS systems like XTTS, CosyVoice2, and F5-TTS in benchmarks.
Offers zero-shot voice cloning capabilities.
Features a character-pinyin hybrid approach for improved Chinese pronunciation.
Enables fine-grained control over pauses using punctuation.

Maintenance & Community

Released model parameters and inference code on March 25, 2025.
Paper submitted to arXiv on February 12, 2025.
Community channels include QQ group (553460296) and Discord (link).

Licensing & Compatibility

The repository does not explicitly state a license. Model weights are available via HuggingFace and ModelScope.

Limitations & Caveats

Windows users may face issues installing pynini, requiring a conda installation.
Contact is required for more detailed information, suggesting potential limitations in public documentation or support.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

5

Issues (30d)

15

Star History

727 stars in the last 30 days

Explore Similar Projects

WenetSpeech-Yue by ASLP-lab

Large-scale Cantonese speech dataset and processing pipeline

Created 5 months ago

Updated 2 weeks ago

speech-recognition-uk by egorsmkv

Resource collection for Ukrainian speech AI

Created 5 years ago

Updated 5 months ago

ASR-TTS-paper-daily by halsay

Daily AI paper updates for ASR and TTS research

Created 1 year ago

Updated 1 day ago

Meta-voicebox by SpeechifyInc

PyTorch implementation of Meta's Voicebox speech model

Created 2 years ago

Updated 2 years ago

PortaSpeech by keonlee9420

PyTorch for portable, high-quality generative TTS

Created 4 years ago

Updated 4 years ago

voicebox-pytorch by lucidrains

Pytorch implementation of MetaAI's Voicebox text-to-speech model

Created 2 years ago

Updated 1 year ago

Fun-ASR by FunAudioLLM

Advanced speech recognition toolkit for global audio

Created 2 months ago

Updated 1 day ago

FireRedTTS by FireRedTeam

LLM-empowered TTS system for research

Created 1 year ago

Updated 5 months ago

parrots by shibing624

ASR/TTS toolkit for multilingual speech processing

Created 7 years ago

Updated 3 months ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs).

speech-synthesis-paper by wenet-e2e

Speech synthesis papers list

Created 5 years ago

Updated 2 years ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

2 more.

fish-speech by fishaudio

Open-source TTS for multilingual speech synthesis

Created 2 years ago

Updated 3 weeks ago

Starred by

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm) and

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

Few-shot voice cloning and TTS web UI

Created 2 years ago

Updated 2 weeks ago

Feedback? Help us improve.