misaki  by hexgrad

G2P engine for Kokoro models

Created 7 months ago
309 stars

Top 86.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Misaki is a versatile grapheme-to-phoneme (G2P) engine designed for speech synthesis models, supporting English, Japanese, Korean, Chinese, and Vietnamese. It offers multiple tokenization strategies and fallback mechanisms for out-of-vocabulary words, catering to researchers and developers in the speech technology domain.

How It Works

Misaki employs a modular design, with distinct tokenizers and phonemizers for each supported language. For English, it offers a transformer-based approach (when trf=True) leveraging contextual word embeddings for homograph disambiguation, and a non-transformer fallback to espeak-ng for robustness. Japanese tokenization utilizes pyopenjtalk with unidic for pitch accent, while older versions relied on cutlet, fugashi, and mecab. Korean and Chinese tokenization are adapted from established libraries like g2pK and paddlespeech, respectively.

Quick Start & Requirements

  • Install English support: !pip install -q "misaki[en]"
  • For espeak-ng fallback: !apt-get -qq -y install espeak-ng and !pip install -q "misaki[en]" phonemizer-fork
  • Official demo: https://hf.co/spaces/hexgrad/Misaki-G2P

Highlighted Details

  • Supports multiple languages: English, Japanese, Korean, Chinese, Vietnamese.
  • Offers transformer-based English G2P with contextual word embeddings for homographs.
  • Integrates pyopenjtalk and unidic for Japanese pitch accent.
  • Leverages g2pK and paddlespeech for Korean and Chinese tokenization.

Maintenance & Community

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is still under active development with several items on the TODO list, including data compression and training seq2seq fallback models. The handling of homographs using BERT and logistic regression is noted as an area for escalation.

Health Check
Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
1
Star History
17 stars in the last 30 days

Explore Similar Projects

Starred by Alex Chen Alex Chen(Cofounder of Nexa AI), David Singleton David Singleton(Cofounder of /dev/agents; Ex-CTO of Stripe), and
1 more.

kokoro by hexgrad

2.4%
4k
TTS inference library for Kokoro-82M
Created 7 months ago
Updated 3 weeks ago
Feedback? Help us improve.