MiraTTS  by ysharma3501

Fast, high-fidelity TTS generation

Created 3 weeks ago

New!

448 stars

Top 67.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

MiraTTS is a high-quality, fast Text-to-Speech (TTS) repository that finetunes the Spark-TTS model for enhanced realism and stability. It targets engineers and researchers seeking to generate clear, crisp 48kHz audio at over 100x realtime speeds, even within 6GB of VRAM. The project offers a significant leap in TTS performance, rivaling closed-source models while maintaining efficiency and low latency.

How It Works

This project leverages a finetuned version of the Spark-TTS model, incorporating optimizations like Lmdeploy and FlashSR. Lmdeploy enables extreme speedups, achieving over 100x realtime performance through efficient inference and batching. FlashSR further enhances audio quality, producing high-fidelity 48kHz outputs. This combination allows MiraTTS to deliver state-of-the-art TTS capabilities with remarkable efficiency and low latency.

Quick Start & Requirements

Installation is straightforward via uv pip install git+https://github.com/ysharma3501/MiraTTS.git. The model requires Python and libraries like librosa for audio file handling. It downloads models from Hugging Face (YatharthS/MiraTTS). While not explicitly stated, GPU acceleration is typically beneficial for TTS tasks. Links to relevant blog posts explaining LLM TTS models and optimization techniques are provided.

Highlighted Details

  • Achieves over 100x realtime audio generation speeds.
  • Produces high-quality, clear 48kHz audio outputs.
  • Operates efficiently within 6GB of VRAM.
  • Offers low latency, potentially as low as 100ms.

Maintenance & Community

The repository acknowledges contributions from the authors of Spark-TTS and unsloth. Direct community support channels like Discord or Slack are not listed, with an email address provided for contact. Future development plans include supporting low-latency streaming and releasing native 48kHz bicodec, indicating these features are not yet implemented.

Licensing & Compatibility

The license type and any compatibility notes for commercial use or closed-source linking are not specified in the provided README content.

Limitations & Caveats

Key features such as low-latency streaming and native 48kHz bicodec generation are listed as future development items, implying they are not currently available. The absence of explicit licensing information presents a significant adoption blocker, requiring further clarification before commercial or widespread use.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
14
Star History
449 stars in the last 25 days

Explore Similar Projects

Feedback? Help us improve.