index-tts  by index-tts

Zero-shot TTS system for industrial use

Created 9 months ago
14,631 stars

Top 3.4% on SourcePulse

GitHubView on GitHub
Project Summary

IndexTTS is an industrial-level zero-shot text-to-speech system designed for high-quality, controllable voice synthesis, particularly excelling in Chinese language scenarios. It targets researchers and developers seeking advanced TTS capabilities, offering state-of-the-art performance and features like pronunciation correction and precise pause control.

How It Works

IndexTTS builds upon XTTS and Tortoise, integrating a conformer conditioning encoder and a BigVGAN2-based speechcode decoder. This architecture enhances training stability, speaker similarity, and audio quality. A key innovation is its character-pinyin hybrid modeling for accurate Chinese pronunciation, alongside punctuation-based pause control.

Quick Start & Requirements

  • Install: pip install -r requirements.txt and pip install -e . for CLI usage.
  • Prerequisites: Python 3.10, ffmpeg, PyTorch. Model weights must be downloaded to a checkpoints directory.
  • Setup: Requires downloading model checkpoints (approx. 1.5GB).
  • Demos: HuggingFace (link), ModelScope (link).
  • Web UI: pip install -e ".[webui]" && python webui.py

Highlighted Details

  • Achieves state-of-the-art performance, outperforming popular TTS systems like XTTS, CosyVoice2, and F5-TTS in benchmarks.
  • Offers zero-shot voice cloning capabilities.
  • Features a character-pinyin hybrid approach for improved Chinese pronunciation.
  • Enables fine-grained control over pauses using punctuation.

Maintenance & Community

  • Released model parameters and inference code on March 25, 2025.
  • Paper submitted to arXiv on February 12, 2025.
  • Community channels include QQ group (553460296) and Discord (link).

Licensing & Compatibility

  • The repository does not explicitly state a license. Model weights are available via HuggingFace and ModelScope.

Limitations & Caveats

  • Windows users may face issues installing pynini, requiring a conda installation.
  • Contact is required for more detailed information, suggesting potential limitations in public documentation or support.
Health Check
Last Commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)
13
Issues (30d)
52
Star History
1,966 stars in the last 30 days

Explore Similar Projects

Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

fish-speech by fishaudio

0.4%
24k
Open-source TTS for multilingual speech synthesis
Created 2 years ago
Updated 1 day ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
52k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.