Matcha-TTS  by shivammehta25

TTS architecture research paper using conditional flow matching

created 1 year ago
1,072 stars

Top 35.9% on sourcepulse

GitHubView on GitHub
Project Summary

Matcha-TTS is a fast, non-autoregressive text-to-speech (TTS) architecture designed for natural-sounding speech synthesis. It targets researchers and developers seeking efficient TTS solutions, offering probabilistic generation with a compact memory footprint and rapid synthesis times.

How It Works

Matcha-TTS employs conditional flow matching, a technique inspired by rectified flows, to accelerate ODE-based speech synthesis. This probabilistic approach models the transformation from noise to speech, enabling faster inference compared to traditional autoregressive models while maintaining high audio quality.

Quick Start & Requirements

  • Install via pip: pip install matcha-tts or from source.
  • Requires Python 3.10 and PyTorch 2.0+.
  • Pre-trained models are downloaded automatically.
  • Demo available on HuggingFace Spaces.

Highlighted Details

  • Utilizes conditional flow matching for fast, non-autoregressive TTS.
  • Probabilistic generation with a compact memory footprint.
  • Achieves highly natural-sounding speech.
  • Supports ONNX export and inference for deployment.

Maintenance & Community

The project is associated with KTH Royal Institute of Technology. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

ONNX export requires PyTorch >= 2.1.0 due to specific operator exportability. Users needing to export models must manually install this version. The project is presented as the official implementation for an ICASSP 2024 paper.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
91 stars in the last 90 days

Explore Similar Projects

Starred by Tim J. Baek Tim J. Baek(Founder of Open WebUI), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

StyleTTS2 by yl4579

0.2%
6k
Text-to-speech model achieving human-level synthesis
created 2 years ago
updated 11 months ago
Feedback? Help us improve.