nix-tts  by rendchevi

Lightweight TTS research paper via module-wise distillation

Created 3 years ago
257 stars

Top 98.4% on SourcePulse

GitHubView on GitHub
Project Summary

Nix-TTS offers a lightweight, end-to-end text-to-speech (TTS) solution by distilling knowledge from a larger, high-quality teacher model. It targets researchers and developers needing efficient TTS capabilities on resource-constrained devices, providing significant speedups and parameter reduction while maintaining reasonable voice quality.

How It Works

Nix-TTS employs module-wise knowledge distillation, a technique that allows for flexible and independent transfer of learned representations from a teacher model to specific components (encoder and decoder) of the student model. This approach enables the student model to inherit the non-autoregressive and vocoder-free characteristics of the teacher, resulting in a compact yet performant TTS system.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Install espeak: sudo apt-get install espeak
  • Download pre-trained models from the provided link.
  • Official Demo: 🤗 Interactive Demo
  • Audio Samples: 📢 Audio Samples

Highlighted Details

  • Achieves 5.23M parameters, up to an 89.34% reduction compared to the teacher model.
  • Offers inference speedups of 3.04x on Intel-i7 CPU and 8.36x on Raspberry Pi 3B.
  • Retains non-autoregressive and end-to-end (vocoder-free) properties.
  • Module-wise distillation allows for flexible student model design.

Maintenance & Community

  • Research funded and authors affiliated with Kata.ai.
  • Adapted components from VITS and Comprehensive-Transformer-TTS.
  • Paper Link: 📄 Paper Link

Licensing & Compatibility

  • The repository does not explicitly state a license. The README mentions funding by Kata.ai, implying potential proprietary use or restrictions. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The repository does not specify a license, which may impact commercial use. While the README claims speedups on Raspberry Pi 3B, the provided table indicates it's slower than real-time (0.50x). The naturalness and intelligibility are described as "fair" compared to the teacher model, suggesting a potential trade-off for size and speed.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.