WhisperKit  by argmaxinc

Speech recognition framework for Apple Silicon

created 1 year ago
4,870 stars

Top 10.4% on sourcepulse

GitHubView on GitHub
Project Summary

WhisperKit is an on-device speech-to-text framework for Apple Silicon, enabling advanced features like real-time streaming and word timestamps. It targets developers building applications for Apple platforms who need efficient and private transcription capabilities.

How It Works

WhisperKit leverages Apple's Core ML framework to deploy state-of-the-art speech recognition models, such as OpenAI's Whisper, directly on user devices. This approach ensures data privacy, low latency, and offline functionality by avoiding cloud-based processing. The framework is optimized for Apple Silicon, maximizing performance and efficiency.

Quick Start & Requirements

  • Installation: Via Swift Package Manager (https://github.com/argmaxinc/whisperkit) or Homebrew (brew install whisperkit-cli).
  • Prerequisites: macOS 14.0+, Xcode 15.0+.
  • Setup: Integration into Xcode projects is straightforward. CLI requires git lfs for model downloads.
  • Docs: Benchmarks & Device Support

Highlighted Details

  • On-device deployment for privacy and offline use.
  • Real-time streaming, word timestamps, and VAD.
  • Supports custom Core ML models via HuggingFace.
  • Includes whisperkittools for model generation.

Maintenance & Community

Licensing & Compatibility

  • MIT License. Permissive for commercial use and integration into closed-source applications.

Limitations & Caveats

The project mentions "WhisperKit Pro and SpeakerKit Pro" for enhanced features, suggesting the open-source version may have limitations in advanced capabilities like speaker diarization. Commercial evaluation requires direct contact.

Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
3
Star History
317 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.