kitten_tts_rs  by second-state

Lightweight, high-quality text-to-speech in Rust

Created 2 weeks ago

New!

277 stars

Top 93.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a high-performance, lightweight Text-to-Speech (TTS) system implemented in Rust, offering a self-contained alternative to Python-based solutions like the original KittenTTS. It targets developers building AI agent skills, real-time audio applications, or deploying TTS on resource-constrained devices, delivering high-quality voice synthesis with minimal overhead and fast startup times.

How It Works

The core of kitten_tts_rs is a Rust port of the KittenTTS models, leveraging ONNX for CPU-optimized inference. It processes input text through normalization, phonemization (using espeak-ng), and token encoding before feeding it into the ONNX runtime. The implementation provides two distinct binaries: a command-line interface (CLI) for direct audio generation and an OpenAI-compatible API server for integration into applications. This Rust-native approach eliminates Python dependencies, drastically reducing binary size and improving startup performance. Optional GPU acceleration via Cargo features (CUDA, TensorRT, CoreML, DirectML) is also supported.

Quick Start & Requirements

Installation involves downloading pre-built binaries and model weights from the project's releases and Hugging Face, respectively. A system-level installation of espeak-ng is required for phonemization. The core binary is approximately 10MB, with model weights ranging from 25MB to 80MB. Official quick-start instructions and download links are provided within the README.

Highlighted Details

  • Ultra-lightweight models (15M-80M parameters, 25-80MB disk) optimized for CPU inference.
  • Provides both a CLI tool and an OpenAI-compatible API server with SSE streaming.
  • Features 8 built-in voices, adjustable speech speed, and text preprocessing.
  • Achieves fast startup times (~100ms) and a tiny binary footprint (~10MB).
  • Supports optional GPU acceleration via Cargo features for CUDA, TensorRT, CoreML, and DirectML.

Maintenance & Community

The project acknowledges contributions from KittenML, pyke/ort, and espeak-ng. Specific details regarding active maintainers, community channels (like Discord or Slack), or a public roadmap are not detailed in the provided README.

Licensing & Compatibility

The project is licensed under the Apache-2.0 license, consistent with the original KittenTTS. This permissive license allows for commercial use and integration into closed-source applications without significant restrictions.

Limitations & Caveats

While CoreML acceleration is available for Apple Silicon, benchmarks indicate it can be slower than CPU-only inference for smaller KittenTTS models and has limitations with dynamic tensor shapes. The AAC audio format is not yet supported. GPU acceleration requires specific build features and corresponding system-level SDKs.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
2
Star History
277 stars in the last 17 days

Explore Similar Projects

Feedback? Help us improve.