Indic-TTS by AI4Bharat

TTS models for Indic languages

Created 3 years ago

336 stars

Top 82.2% on SourcePulse

Project Summary

This project provides state-of-the-art Text-to-Speech (TTS) models for 13 Indian languages, addressing the under-representation of these languages in speech synthesis research. It targets developers and researchers working with Indian languages, offering improved speech quality over existing models.

How It Works

The system utilizes a unified architecture based on FastPitch for acoustic modeling and HiFi-GAN V1 for vocoding. Models are trained jointly on male and female speakers, a configuration identified through extensive evaluation of acoustic models, vocoders, loss functions, and training schedules. This approach yields superior performance across Dravidian and Indo-Aryan languages.

Quick Start & Requirements

Install: Clone repositories for Trainer and TTS, install dependencies using pip3 install -e .[all] for both, and then pip3 install -r requirements.txt.
Prerequisites: CUDA 11.3, PyTorch, libsndfile1-dev, ffmpeg, enchant.
Resources: Requires significant computational resources for training. Inference setup involves downloading pre-trained model weights and configuration files.
Links: ArXiv Preprint, Audio Samples, Try It Live, Video.

Highlighted Details

Supports 13 Indian languages: Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Odia, Rajasthani, Tamil, and Telugu.
Achieves significant improvements over existing models as measured by Mean Opinion Scores (MOS).
Models are open-sourced on the Bhashini platform.
Code references the coqui-ai/TTS library.

Maintenance & Community

Developed by AI4Bharat, a mission-driven initiative.
Accepted at ICASSP 2023.

Licensing & Compatibility

The README does not explicitly state a license. The underlying coqui-ai/TTS library is typically Apache 2.0, but this specific fork's license requires verification.

Limitations & Caveats

The setup instructions involve manual patching of the Trainer library, indicating potential instability or ongoing development.
The license is not clearly specified, which may impact commercial use.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days