Discover and explore top open-source AI tools and projects—updated daily.
second-stateLightweight, high-quality text-to-speech in Rust
New!
Top 93.6% on SourcePulse
This project provides a high-performance, lightweight Text-to-Speech (TTS) system implemented in Rust, offering a self-contained alternative to Python-based solutions like the original KittenTTS. It targets developers building AI agent skills, real-time audio applications, or deploying TTS on resource-constrained devices, delivering high-quality voice synthesis with minimal overhead and fast startup times.
How It Works
The core of kitten_tts_rs is a Rust port of the KittenTTS models, leveraging ONNX for CPU-optimized inference. It processes input text through normalization, phonemization (using espeak-ng), and token encoding before feeding it into the ONNX runtime. The implementation provides two distinct binaries: a command-line interface (CLI) for direct audio generation and an OpenAI-compatible API server for integration into applications. This Rust-native approach eliminates Python dependencies, drastically reducing binary size and improving startup performance. Optional GPU acceleration via Cargo features (CUDA, TensorRT, CoreML, DirectML) is also supported.
Quick Start & Requirements
Installation involves downloading pre-built binaries and model weights from the project's releases and Hugging Face, respectively. A system-level installation of espeak-ng is required for phonemization. The core binary is approximately 10MB, with model weights ranging from 25MB to 80MB. Official quick-start instructions and download links are provided within the README.
Highlighted Details
Maintenance & Community
The project acknowledges contributions from KittenML, pyke/ort, and espeak-ng. Specific details regarding active maintainers, community channels (like Discord or Slack), or a public roadmap are not detailed in the provided README.
Licensing & Compatibility
The project is licensed under the Apache-2.0 license, consistent with the original KittenTTS. This permissive license allows for commercial use and integration into closed-source applications without significant restrictions.
Limitations & Caveats
While CoreML acceleration is available for Apple Silicon, benchmarks indicate it can be slower than CPU-only inference for smaller KittenTTS models and has limitations with dynamic tensor shapes. The AAC audio format is not yet supported. GPU acceleration requires specific build features and corresponding system-level SDKs.
1 week ago
Inactive
kyutai-labs
neonbjb