Discover and explore top open-source AI tools and projects—updated daily.
ysharma3501Fast, high-fidelity TTS generation
New!
Top 67.0% on SourcePulse
Summary
MiraTTS is a high-quality, fast Text-to-Speech (TTS) repository that finetunes the Spark-TTS model for enhanced realism and stability. It targets engineers and researchers seeking to generate clear, crisp 48kHz audio at over 100x realtime speeds, even within 6GB of VRAM. The project offers a significant leap in TTS performance, rivaling closed-source models while maintaining efficiency and low latency.
How It Works
This project leverages a finetuned version of the Spark-TTS model, incorporating optimizations like Lmdeploy and FlashSR. Lmdeploy enables extreme speedups, achieving over 100x realtime performance through efficient inference and batching. FlashSR further enhances audio quality, producing high-fidelity 48kHz outputs. This combination allows MiraTTS to deliver state-of-the-art TTS capabilities with remarkable efficiency and low latency.
Quick Start & Requirements
Installation is straightforward via uv pip install git+https://github.com/ysharma3501/MiraTTS.git. The model requires Python and libraries like librosa for audio file handling. It downloads models from Hugging Face (YatharthS/MiraTTS). While not explicitly stated, GPU acceleration is typically beneficial for TTS tasks. Links to relevant blog posts explaining LLM TTS models and optimization techniques are provided.
Highlighted Details
Maintenance & Community
The repository acknowledges contributions from the authors of Spark-TTS and unsloth. Direct community support channels like Discord or Slack are not listed, with an email address provided for contact. Future development plans include supporting low-latency streaming and releasing native 48kHz bicodec, indicating these features are not yet implemented.
Licensing & Compatibility
The license type and any compatibility notes for commercial use or closed-source linking are not specified in the provided README content.
Limitations & Caveats
Key features such as low-latency streaming and native 48kHz bicodec generation are listed as future development items, implying they are not currently available. The absence of explicit licensing information presents a significant adoption blocker, requiring further clarification before commercial or widespread use.
2 weeks ago
Inactive
Vaibhavs10