Discover and explore top open-source AI tools and projects—updated daily.
Misaki is a versatile grapheme-to-phoneme (G2P) engine designed for speech synthesis models, supporting English, Japanese, Korean, Chinese, and Vietnamese. It offers multiple tokenization strategies and fallback mechanisms for out-of-vocabulary words, catering to researchers and developers in the speech technology domain.
How It Works
Misaki employs a modular design, with distinct tokenizers and phonemizers for each supported language. For English, it offers a transformer-based approach (when trf=True
) leveraging contextual word embeddings for homograph disambiguation, and a non-transformer fallback to espeak-ng
for robustness. Japanese tokenization utilizes pyopenjtalk
with unidic
for pitch accent, while older versions relied on cutlet
, fugashi
, and mecab
. Korean and Chinese tokenization are adapted from established libraries like g2pK
and paddlespeech
, respectively.
Quick Start & Requirements
!pip install -q "misaki[en]"
espeak-ng
fallback: !apt-get -qq -y install espeak-ng
and !pip install -q "misaki[en]" phonemizer-fork
Highlighted Details
pyopenjtalk
and unidic
for Japanese pitch accent.g2pK
and paddlespeech
for Korean and Chinese tokenization.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is still under active development with several items on the TODO list, including data compression and training seq2seq fallback models. The handling of homographs using BERT and logistic regression is noted as an area for escalation.
2 weeks ago
1 day