misaki by hexgrad

G2P engine for Kokoro models

Created 1 year ago

421 stars

Top 70.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Bojan Tunguz

AI Scientist; Formerly at NVIDIA

Alex Chen

Cofounder of Nexa AI

Project Summary

Misaki is a versatile grapheme-to-phoneme (G2P) engine designed for speech synthesis models, supporting English, Japanese, Korean, Chinese, and Vietnamese. It offers multiple tokenization strategies and fallback mechanisms for out-of-vocabulary words, catering to researchers and developers in the speech technology domain.

How It Works

Misaki employs a modular design, with distinct tokenizers and phonemizers for each supported language. For English, it offers a transformer-based approach (when trf=True) leveraging contextual word embeddings for homograph disambiguation, and a non-transformer fallback to espeak-ng for robustness. Japanese tokenization utilizes pyopenjtalk with unidic for pitch accent, while older versions relied on cutlet, fugashi, and mecab. Korean and Chinese tokenization are adapted from established libraries like g2pK and paddlespeech, respectively.

Quick Start & Requirements

Install English support: !pip install -q "misaki[en]"
For espeak-ng fallback: !apt-get -qq -y install espeak-ng and !pip install -q "misaki[en]" phonemizer-fork
Official demo: https://hf.co/spaces/hexgrad/Misaki-G2P

Highlighted Details

Supports multiple languages: English, Japanese, Korean, Chinese, Vietnamese.
Offers transformer-based English G2P with contextual word embeddings for homographs.
Integrates pyopenjtalk and unidic for Japanese pitch accent.
Leverages g2pK and paddlespeech for Korean and Chinese tokenization.

Maintenance & Community

Active development with contributions from various individuals and projects.
Discord server available: https://discord.gg/QuGxSWBfQy

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is still under active development with several items on the TODO list, including data compression and training seq2seq fallback models. The handling of homographs using BERT and logistic regression is noted as an area for escalation.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

19 stars in the last 30 days