TTS/STT/STS library for efficient speech analysis on Apple Silicon
Top 19.0% on sourcepulse
This library provides text-to-speech (TTS) and speech-to-speech (STS) capabilities leveraging Apple's MLX framework for efficient inference on Apple Silicon. It targets developers and users seeking fast, customizable speech synthesis with features like multiple voices, adjustable speed, and an interactive web interface with 3D audio visualization.
How It Works
MLX-Audio utilizes the MLX framework for accelerated computation on Apple Silicon, enabling fast inference for its TTS and STS models. It supports the Kokoro model architecture for multilingual TTS and the CSM model for voice cloning via reference audio. The library offers direct Python API access, a CLI for quick generation, and a FastAPI-based web server with a 3D audio visualizer.
Quick Start & Requirements
pip install mlx-audio
pip install -r requirements.txt
misaki[ja]
or misaki[zh]
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README indicates that Japanese and Mandarin Chinese language support require additional misaki
package installations. The "open output folder" feature for the web interface is noted to work only when running the server locally.
1 week ago
1 day