Open-source TTS model for multilingual voice cloning
Top 95.3% on sourcepulse
MahaTTS is an open-source, large-scale text-to-speech (TTS) model developed by Dubverse.ai, offering multilingual voice cloning and cross-lingual prosody transfer. It is designed for researchers and developers seeking advanced speech synthesis capabilities, including zero-shot voice cloning and style transfer across languages, with pre-trained checkpoints available for commercial use.
How It Works
MahaTTS draws inspiration from Tortoise TTS but uniquely employs seamless M4t wav2vec2 for semantic token extraction. This multilingual training of wav2vec2 enhances the model's scalability across various languages. The architecture comprises a Text-to-Semantic model (84M parameters, Causal LM), a Semantic-to-MelSpec diffusion model (430M parameters), and a HiFi-GAN vocoder (13M parameters) for audio waveform generation.
Quick Start & Requirements
pip install git+https://github.com/dubverse-ai/MahaTTS.git
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Latency is noted as an ongoing issue. The project is actively training larger models, suggesting potential for breaking changes or API shifts in future releases.
1 year ago
1 day