supertonic by supertone-inc

Lightning-fast, on-device Text-to-Speech (TTS)

Created 6 months ago

11,393 stars

Top 4.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Syrus Akbary

Founder of Wasmer

Project Summary

Supertonic provides a lightning-fast, on-device text-to-speech (TTS) system using ONNX Runtime for extreme performance and minimal computational overhead. It offers complete privacy and zero latency, targeting developers needing efficient TTS across diverse platforms without cloud dependencies.

How It Works

This system utilizes ONNX Runtime for cross-platform, on-device inference, featuring a lightweight 66M parameter model. Its core advantage lies in local processing, ensuring privacy and zero latency, coupled with efficient handling of complex text inputs without pre-processing.

Quick Start & Requirements

Install: Clone the repository. Download ONNX models and voices from Hugging Face Hub using Git LFS.
Prerequisites: Git LFS installation.
Run: Examples provided for Python, Node.js, Browser (WebGPU/WASM), Java, C++, C#, Go, Swift, iOS, and Rust, each with specific build/run commands.
Docs/Demo: Interactive Demo (in-browser), Hugging Face Hub models, Raspberry Pi demo video.

Highlighted Details

Performance: Up to 167x faster than real-time on consumer hardware (M4 Pro CPU), achieving very low Real-time Factors (e.g., 0.001 on RTX4090).
Lightweight: 66M parameters, optimized for minimal footprint.
On-Device: Guarantees privacy and zero latency.
Natural Text Handling: Processes numbers, dates, currency, abbreviations, and complex expressions natively.
Multi-Platform: Broad ecosystem support including web, mobile, and server-side languages.

Maintenance & Community

Copyright held by Supertone Inc. Associated research papers are recent (2025 arXiv preprints). No specific community channels or detailed maintenance status are provided in the README.

Licensing & Compatibility

Sample code is MIT licensed. The model is under the OpenRAIL-M License. PyTorch (training dependency) uses BSD 3-Clause. The MIT license is generally permissive for commercial use; OpenRAIL-M requires review for responsible AI usage terms.

Limitations & Caveats

GPU mode for ONNX Runtime inference is noted as untested. The README does not detail bus factor or specific deprecation warnings.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7,150 stars in the last 30 days