Indic-TTS  by AI4Bharat

TTS models for Indic languages

created 2 years ago
260 stars

Top 98.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides state-of-the-art Text-to-Speech (TTS) models for 13 Indian languages, addressing the under-representation of these languages in speech synthesis research. It targets developers and researchers working with Indian languages, offering improved speech quality over existing models.

How It Works

The system utilizes a unified architecture based on FastPitch for acoustic modeling and HiFi-GAN V1 for vocoding. Models are trained jointly on male and female speakers, a configuration identified through extensive evaluation of acoustic models, vocoders, loss functions, and training schedules. This approach yields superior performance across Dravidian and Indo-Aryan languages.

Quick Start & Requirements

  • Install: Clone repositories for Trainer and TTS, install dependencies using pip3 install -e .[all] for both, and then pip3 install -r requirements.txt.
  • Prerequisites: CUDA 11.3, PyTorch, libsndfile1-dev, ffmpeg, enchant.
  • Resources: Requires significant computational resources for training. Inference setup involves downloading pre-trained model weights and configuration files.
  • Links: ArXiv Preprint, Audio Samples, Try It Live, Video.

Highlighted Details

  • Supports 13 Indian languages: Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Odia, Rajasthani, Tamil, and Telugu.
  • Achieves significant improvements over existing models as measured by Mean Opinion Scores (MOS).
  • Models are open-sourced on the Bhashini platform.
  • Code references the coqui-ai/TTS library.

Maintenance & Community

  • Developed by AI4Bharat, a mission-driven initiative.
  • Accepted at ICASSP 2023.

Licensing & Compatibility

  • The README does not explicitly state a license. The underlying coqui-ai/TTS library is typically Apache 2.0, but this specific fork's license requires verification.

Limitations & Caveats

  • The setup instructions involve manual patching of the Trainer library, indicating potential instability or ongoing development.
  • The license is not clearly specified, which may impact commercial use.
Health Check
Last commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
25 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Lianmin Zheng Lianmin Zheng(Author of SGLang).

fish-speech by fishaudio

0.3%
23k
Open-source TTS for multilingual speech synthesis
created 1 year ago
updated 1 week ago
Feedback? Help us improve.