MahaTTS by dubverse-ai

Open-source TTS model for multilingual voice cloning

Created 2 years ago

275 stars

Top 94.3% on SourcePulse

Project Summary

MahaTTS is an open-source, large-scale text-to-speech (TTS) model developed by Dubverse.ai, offering multilingual voice cloning and cross-lingual prosody transfer. It is designed for researchers and developers seeking advanced speech synthesis capabilities, including zero-shot voice cloning and style transfer across languages, with pre-trained checkpoints available for commercial use.

How It Works

MahaTTS draws inspiration from Tortoise TTS but uniquely employs seamless M4t wav2vec2 for semantic token extraction. This multilingual training of wav2vec2 enhances the model's scalability across various languages. The architecture comprises a Text-to-Semantic model (84M parameters, Causal LM), a Semantic-to-MelSpec diffusion model (430M parameters), and a HiFi-GAN vocoder (13M parameters) for audio waveform generation.

Quick Start & Requirements

Install via pip: pip install git+https://github.com/dubverse-ai/MahaTTS.git
Requires PyTorch and a CUDA-enabled GPU for optimal performance.
Example usage and pretrained models are available on Hugging Face.
Colab notebook provided for quick experimentation:

Highlighted Details

Supports voice cloning in multiple seen and unseen speaker identities.
Enables multilingual and cross-lingual voice cloning with prosody transfer.
Released "Smolie English" (9k hours English data) and "Smolie Indic" (400 hours, 9 Indian languages).
Future plans include a 1B parameter model trained on 20K hours across 15 languages.

Maintenance & Community

Project is actively under development, with ongoing work to improve robustness and reduce latency.
Updates may take time as they train larger models.
Contributions for inference optimization are welcomed.

Licensing & Compatibility

Licensed under the Apache 2.0 License.
Pretrained model checkpoints are available for commercial use.

Limitations & Caveats

Latency is noted as an ongoing issue. The project is actively training larger models, suggesting potential for breaking changes or API shifts in future releases.

MahaTTS by dubverse-ai

Explore Similar Projects

easevoice-trainer by megaease

cosyvoice-api by jianchang512

Cross-Lingual-Voice-Cloning by deterministic-algorithms-lab

FireRedTTS by FireRedTeam

Multilingual_Text_to_Speech by Tomiinek

WhisperSpeech by WhisperSpeech

metavoice-src by metavoiceio

VITS-fast-fine-tuning by Plachtaa

VALL-E-X by Plachtaa

CosyVoice by FunAudioLLM

OpenVoice by myshell-ai

GPT-SoVITS by RVC-Boss