kokoro-ios  by mlalma

Fast, high-quality text-to-speech for Apple platforms

Created 1 year ago
254 stars

Top 99.0% on SourcePulse

GitHubView on GitHub
Project Summary

Kokoro TTS for iOS/macOS provides a high-quality, faster-than-real-time English text-to-speech engine for Apple platforms. Aimed at developers integrating speech synthesis into applications, it offers efficient audio generation leveraging Apple's MLX framework.

How It Works

This project ports a PyTorch-based TTS engine (from MLX-Audio) to MLX Swift, enabling native performance on Apple hardware. It utilizes Grapheme-to-Phoneme (G2P) conversion, primarily through the MisakiSwift library, to process input text before neural synthesis. The core advantage lies in its optimized implementation for MLX Swift, achieving significantly faster-than-real-time audio output.

Quick Start & Requirements

Installation is handled via Swift Package Manager: add .package(url: "https://github.com/mlalma/kokoro-ios.git", from: "1.0.0") to your project. The library requires iOS 18.0+ or macOS 15.0+. Key dependencies include MLX Swift, MisakiSwift, and MLXUtilsLibrary. Crucially, users must provide their own Kokoro TTS model files and voice style embeddings, typically included within the integrating application package. Refer to the Kokoro Test App for usage examples.

Highlighted Details

  • Added token timestamps for finer-grained audio control (v1.0.8).
  • Voice style management is externalized to the integrating application (v1.0.5).
  • Achieves approximately 3.3x faster-than-real-time audio generation on an iPhone 13 Pro (release build, post-warm-up).

Maintenance & Community

Specific details regarding maintainers, sponsorships, or community channels (like Discord/Slack) are not present in the provided README snippet.

Licensing & Compatibility

The project is licensed under the MIT License, which is generally permissive for commercial use and integration into closed-source applications.

Limitations & Caveats

Users must independently source and manage the large TTS model files and voice style embeddings. The library mandates relatively recent Apple operating system versions (iOS 18+, macOS 15+). Integration requires familiarity with Swift Package Manager and Apple's development ecosystem.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
24 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.