kokoro  by hexgrad

TTS inference library for Kokoro-82M

Created 8 months ago
4,394 stars

Top 11.2% on SourcePulse

GitHubView on GitHub
Project Summary

Kokoro is an open-weight Text-to-Speech (TTS) library built around the Kokoro-82M model, offering high-quality speech generation with significantly improved speed and cost-efficiency compared to larger models. It is designed for developers and researchers seeking a lightweight yet powerful TTS solution deployable across various environments.

How It Works

Kokoro leverages an 82-million parameter model, achieving quality comparable to larger systems. It utilizes the misaki library for G2P (Grapheme-to-Phoneme) conversion, supporting multiple languages. The library is designed for efficient inference, enabling faster generation times and reduced computational overhead.

Quick Start & Requirements

  • Install: pip install -q kokoro>=0.9.4 soundfile
  • Prerequisites: espeak-ng (for English OOD fallback and some non-English languages). Installation instructions for Windows and macOS (MPS GPU acceleration) are provided. A conda environment.yml is available for dependency management.
  • Demo: A Google Colab notebook is linked for easy experimentation.
  • Docs: Official documentation and sample audio are available.

Highlighted Details

  • Lightweight 82M parameter model.
  • High-quality speech generation.
  • Faster and more cost-efficient than larger models.
  • Apache-licensed weights for broad deployment.
  • Multi-language support via misaki G2P library.

Maintenance & Community

  • Active Discord server: https://discord.gg/QuGxSWBfQy
  • Acknowledgements include contributions to StyleTTS 2 and TTS Spaces Arena.

Licensing & Compatibility

  • License: Apache-licensed weights.
  • Compatibility: Suitable for commercial and personal projects due to permissive licensing.

Limitations & Caveats

The misaki library requires separate installation for non-English languages (e.g., pip install misaki[ja] for Japanese, pip install misaki[zh] for Mandarin Chinese). The README mentions espeak-ng is needed for English OOD fallback and some non-English languages, implying potential limitations for unsupported languages without espeak-ng or specific misaki installations.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
6
Issues (30d)
5
Star History
372 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.