TTS inference library for Kokoro-82M
Top 13.1% on sourcepulse
Kokoro is an open-weight Text-to-Speech (TTS) library built around the Kokoro-82M model, offering high-quality speech generation with significantly improved speed and cost-efficiency compared to larger models. It is designed for developers and researchers seeking a lightweight yet powerful TTS solution deployable across various environments.
How It Works
Kokoro leverages an 82-million parameter model, achieving quality comparable to larger systems. It utilizes the misaki
library for G2P (Grapheme-to-Phoneme) conversion, supporting multiple languages. The library is designed for efficient inference, enabling faster generation times and reduced computational overhead.
Quick Start & Requirements
pip install -q kokoro>=0.9.4 soundfile
espeak-ng
(for English OOD fallback and some non-English languages). Installation instructions for Windows and macOS (MPS GPU acceleration) are provided. A conda environment.yml
is available for dependency management.Highlighted Details
misaki
G2P library.Maintenance & Community
https://discord.gg/QuGxSWBfQy
Licensing & Compatibility
Limitations & Caveats
The misaki
library requires separate installation for non-English languages (e.g., pip install misaki[ja]
for Japanese, pip install misaki[zh]
for Mandarin Chinese). The README mentions espeak-ng
is needed for English OOD fallback and some non-English languages, implying potential limitations for unsupported languages without espeak-ng
or specific misaki
installations.
1 week ago
Inactive