kokoro by hexgrad

TTS inference library for Kokoro-82M

Created 1 year ago

5,762 stars

Top 8.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Alex Chen

Cofounder of Nexa AI

David Singleton

Cofounder of /dev/agents; Ex-CTO of Stripe

Simon Willison

Coauthor of Django

Project Summary

Kokoro is an open-weight Text-to-Speech (TTS) library built around the Kokoro-82M model, offering high-quality speech generation with significantly improved speed and cost-efficiency compared to larger models. It is designed for developers and researchers seeking a lightweight yet powerful TTS solution deployable across various environments.

How It Works

Kokoro leverages an 82-million parameter model, achieving quality comparable to larger systems. It utilizes the misaki library for G2P (Grapheme-to-Phoneme) conversion, supporting multiple languages. The library is designed for efficient inference, enabling faster generation times and reduced computational overhead.

Quick Start & Requirements

Install: pip install -q kokoro>=0.9.4 soundfile
Prerequisites: espeak-ng (for English OOD fallback and some non-English languages). Installation instructions for Windows and macOS (MPS GPU acceleration) are provided. A conda environment.yml is available for dependency management.
Demo: A Google Colab notebook is linked for easy experimentation.
Docs: Official documentation and sample audio are available.

Highlighted Details

Lightweight 82M parameter model.
High-quality speech generation.
Faster and more cost-efficient than larger models.
Apache-licensed weights for broad deployment.
Multi-language support via misaki G2P library.

Maintenance & Community

Active Discord server: https://discord.gg/QuGxSWBfQy
Acknowledgements include contributions to StyleTTS 2 and TTS Spaces Arena.

Licensing & Compatibility

License: Apache-licensed weights.
Compatibility: Suitable for commercial and personal projects due to permissive licensing.

Limitations & Caveats

The misaki library requires separate installation for non-English languages (e.g., pip install misaki[ja] for Japanese, pip install misaki[zh] for Mandarin Chinese). The README mentions espeak-ng is needed for English OOD fallback and some non-English languages, implying potential limitations for unsupported languages without espeak-ng or specific misaki installations.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

309 stars in the last 30 days