speech-swift  by soniqo

Apple Silicon speech AI toolkit

Created 1 month ago
377 stars

Top 75.7% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

This project provides a comprehensive suite of AI speech models (ASR, TTS, speech-to-speech, VAD, diarization) optimized for on-device execution on Apple Silicon. It targets developers building native macOS and iOS applications, offering significant performance benefits and privacy by leveraging MLX and CoreML frameworks.

How It Works

The toolkit integrates advanced speech models, including Qwen3-ASR, CosyVoice TTS, and PersonaPlex, utilizing Apple's MLX library for GPU acceleration and CoreML for Neural Engine efficiency. This dual-backend approach allows for high-throughput processing or power-optimized inference, enabling real-time speech applications directly on user devices without cloud dependencies.

Quick Start & Requirements

Installation is streamlined via Homebrew (brew install speech) or Swift Package Manager. Building from source requires git clone and make build. Essential prerequisites include Swift 5.9+, macOS 14+ or iOS 17+, Apple Silicon hardware, and Xcode 15+ with the Metal Toolchain. Initial model downloads can range from megabytes to several gigabytes.

Highlighted Details

  • Apple Silicon Native: Fully optimized for M-series chips, achieving faster-than-real-time (RTF < 1.0) performance across many tasks.
  • Broad Functionality: Encompasses speech recognition, text-to-speech, full-duplex speech-to-speech, voice activity detection, speaker diarization, and speech enhancement.
  • Flexible Backends: Supports both MLX for maximum GPU throughput and CoreML for Neural Engine utilization, balancing performance and power consumption.
  • Interactive Demos: Includes ready-to-run examples like PersonaPlexDemo for a conversational voice assistant.

Maintenance & Community

The project actively incorporates recent advancements, with recent news highlighting new features like speaker diarization and PersonaPlex integration. A roadmap discussion is available for community input, and contributions via pull requests are welcomed.

Licensing & Compatibility

The project is released under the permissive Apache 2.0 license, ensuring broad compatibility for commercial use and integration into closed-source applications.

Limitations & Caveats

Strictly limited to Apple Silicon hardware and recent macOS/iOS versions; Rosetta/x86_64 architectures are unsupported. Successful GPU acceleration via MLX depends on correctly building the MLX Metal library, which can be a setup hurdle. CoreML offers power efficiency but may yield lower throughput for single-model tasks compared to MLX.

Health Check
Last Commit

21 hours ago

Responsiveness

Inactive

Pull Requests (30d)
58
Issues (30d)
55
Star History
347 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

moonshine by moonshine-ai

2.9%
7k
Speech-to-text models optimized for fast, accurate ASR on edge devices
Created 1 year ago
Updated 3 days ago
Feedback? Help us improve.