tiny-tts  by tronghieuit

Ultra-lightweight English Text-to-Speech model

Created 2 months ago
413 stars

Top 70.4% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an ultra-lightweight, end-to-end English Text-to-Speech (TTS) model, TinyTTS, designed for resource-constrained environments. With approximately 1.6 million parameters and a ~3.4 MB ONNX checkpoint, it enables natural-sounding speech synthesis on CPU-only machines, edge devices, and embedded systems, offering a significant reduction in computational and memory requirements compared to conventional TTS solutions.

How It Works

TinyTTS employs an end-to-end architecture, eliminating the need for separate vocoder components often found in larger TTS systems. Its core advantage lies in its extreme parameter efficiency and small model size, achieved through optimized neural network design. Inference is further accelerated using ONNX Runtime, which fuses operations and reduces overhead, enabling high-speed synthesis even on modest hardware.

Quick Start & Requirements

Highlighted Details

  • Model Size: ~1.6M parameters and ~3.4 MB ONNX FP16 checkpoint, drastically smaller than typical TTS models (50M–200M+ parameters, 200 MB–1 GB+).
  • Performance: Achieves ~53x real-time synthesis on a laptop CPU using ONNX Runtime (92ms synthesis for ~4.88s audio).
  • Sample Rate: Supports 44.1 kHz, providing high-fidelity audio.
  • End-to-End: Simplifies the TTS pipeline by integrating all necessary components.
  • Cross-Platform: Available via both Python (PyPI) and Node.js (npm) packages, with the Node.js version offering zero Python dependencies.

Maintenance & Community

The project lists several "TODO" items, including releasing public training code, adding more English speakers, and implementing ultra-lightweight zero-shot voice cloning. No specific community channels (like Discord or Slack) or notable contributors/sponsorships are detailed in the README.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0. This permissive license is generally compatible with commercial use and integration into closed-source applications.

Limitations & Caveats

The current focus is exclusively on English TTS. While the Node.js G2P implementation achieves 100% phoneme-level match with Python, direct performance comparisons with other TTS engines should account for potential differences in output sample rates (e.g., Piper's 22kHz vs. TinyTTS's 44.1kHz). Public training code is not yet available.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
106 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

moonshine by moonshine-ai

1.1%
8k
Speech-to-text models optimized for fast, accurate ASR on edge devices
Created 1 year ago
Updated 4 days ago
Feedback? Help us improve.