kokoro  by hexgrad

TTS inference library for Kokoro-82M

created 6 months ago
3,795 stars

Top 13.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Kokoro is an open-weight Text-to-Speech (TTS) library built around the Kokoro-82M model, offering high-quality speech generation with significantly improved speed and cost-efficiency compared to larger models. It is designed for developers and researchers seeking a lightweight yet powerful TTS solution deployable across various environments.

How It Works

Kokoro leverages an 82-million parameter model, achieving quality comparable to larger systems. It utilizes the misaki library for G2P (Grapheme-to-Phoneme) conversion, supporting multiple languages. The library is designed for efficient inference, enabling faster generation times and reduced computational overhead.

Quick Start & Requirements

  • Install: pip install -q kokoro>=0.9.4 soundfile
  • Prerequisites: espeak-ng (for English OOD fallback and some non-English languages). Installation instructions for Windows and macOS (MPS GPU acceleration) are provided. A conda environment.yml is available for dependency management.
  • Demo: A Google Colab notebook is linked for easy experimentation.
  • Docs: Official documentation and sample audio are available.

Highlighted Details

  • Lightweight 82M parameter model.
  • High-quality speech generation.
  • Faster and more cost-efficient than larger models.
  • Apache-licensed weights for broad deployment.
  • Multi-language support via misaki G2P library.

Maintenance & Community

  • Active Discord server: https://discord.gg/QuGxSWBfQy
  • Acknowledgements include contributions to StyleTTS 2 and TTS Spaces Arena.

Licensing & Compatibility

  • License: Apache-licensed weights.
  • Compatibility: Suitable for commercial and personal projects due to permissive licensing.

Limitations & Caveats

The misaki library requires separate installation for non-English languages (e.g., pip install misaki[ja] for Japanese, pip install misaki[zh] for Mandarin Chinese). The README mentions espeak-ng is needed for English OOD fallback and some non-English languages, implying potential limitations for unsupported languages without espeak-ng or specific misaki installations.

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
9
Issues (30d)
5
Star History
1,234 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

MiniCPM-o by OpenBMB

0.2%
20k
MLLM for vision, speech, and multimodal live streaming on your phone
created 1 year ago
updated 1 month ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Lianmin Zheng Lianmin Zheng(Author of SGLang).

fish-speech by fishaudio

0.3%
23k
Open-source TTS for multilingual speech synthesis
created 1 year ago
updated 1 week ago
Feedback? Help us improve.