MiraTTS by ysharma3501

Fast, high-fidelity TTS generation

Created 2 months ago

503 stars

Top 62.0% on SourcePulse

Project Summary

Summary

MiraTTS is a high-quality, fast Text-to-Speech (TTS) repository that finetunes the Spark-TTS model for enhanced realism and stability. It targets engineers and researchers seeking to generate clear, crisp 48kHz audio at over 100x realtime speeds, even within 6GB of VRAM. The project offers a significant leap in TTS performance, rivaling closed-source models while maintaining efficiency and low latency.

How It Works

This project leverages a finetuned version of the Spark-TTS model, incorporating optimizations like Lmdeploy and FlashSR. Lmdeploy enables extreme speedups, achieving over 100x realtime performance through efficient inference and batching. FlashSR further enhances audio quality, producing high-fidelity 48kHz outputs. This combination allows MiraTTS to deliver state-of-the-art TTS capabilities with remarkable efficiency and low latency.

Quick Start & Requirements

Installation is straightforward via uv pip install git+https://github.com/ysharma3501/MiraTTS.git. The model requires Python and libraries like librosa for audio file handling. It downloads models from Hugging Face (YatharthS/MiraTTS). While not explicitly stated, GPU acceleration is typically beneficial for TTS tasks. Links to relevant blog posts explaining LLM TTS models and optimization techniques are provided.

Highlighted Details

Achieves over 100x realtime audio generation speeds.
Produces high-quality, clear 48kHz audio outputs.
Operates efficiently within 6GB of VRAM.
Offers low latency, potentially as low as 100ms.

Maintenance & Community

The repository acknowledges contributions from the authors of Spark-TTS and unsloth. Direct community support channels like Discord or Slack are not listed, with an email address provided for contact. Future development plans include supporting low-latency streaming and releasing native 48kHz bicodec, indicating these features are not yet implemented.

Licensing & Compatibility

The license type and any compatibility notes for commercial use or closed-source linking are not specified in the provided README content.

Limitations & Caveats

Key features such as low-latency streaming and native 48kHz bicodec generation are listed as future development items, implying they are not currently available. The absence of explicit licensing information presents a significant adoption blocker, requiring further clarification before commercial or widespread use.

MiraTTS by ysharma3501

Explore Similar Projects

marvis-tts by Marvis-Labs

T5Gemma-TTS by Aratako

VITA-Audio by VITA-MLLM

FireRedTTS by FireRedTeam

Chatterbox-TTS-Extended by petermg

soprano by ekwek1

LuxTTS by ysharma3501

xtts-api-server by daswer123

VieNeu-TTS by pnnbao97

insanely-fast-whisper by Vaibhavs10

VoxCPM by OpenBMB

bark by suno-ai