Lightning-SimulWhisper  by altalt-org

Accelerated local speech transcription for Apple Silicon

Created 3 months ago
364 stars

Top 77.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary Lightning-SimulWhisper delivers high-performance, real-time local speech-to-text transcription optimized for Apple Silicon devices. It addresses the demand for efficient on-device processing by integrating MLX and CoreML, yielding substantial speed gains and enhanced power efficiency over standard PyTorch solutions. This project benefits users requiring rapid, responsive transcription without cloud dependencies.

How It Works The project employs a hybrid architecture for Whisper transcription on Apple Silicon. CoreML accelerates encoder inference via the Apple Neural Engine, achieving up to 18x speedups and reduced power draw. MLX manages the decoder, offering up to 15x speed improvements over PyTorch and supporting the AlignAtt policy for simultaneous decoding and configurable beam search. This dual-framework approach optimizes both transcription speed and system efficiency.

Quick Start & Requirements

  • Installation: Basic setup: pip install -r requirements.txt. For CoreML acceleration: pip install coremltools ane_transformers.
  • Prerequisites: Apple Silicon hardware is mandatory. CoreML acceleration requires cloning whisper.cpp and generating CoreML encoder models via ./scripts/generate_coreml_encoder.sh <model_name>.
  • Resource Footprint: Significant speedups are claimed, particularly with CoreML, suggesting efficient per-transcription resource utilization.
  • Documentation: Usage examples and detailed command-line arguments are available in the README.

Highlighted Details

  • Achieves up to 18x encoder and 15x decoder speed increases compared to PyTorch.
  • CoreML acceleration provides notably lower power consumption than MLX-only configurations.
  • Supports a comprehensive range of Whisper models, from tiny to large-v3-turbo.
  • Features the AlignAtt policy for advanced simultaneous streaming speech recognition.

Maintenance & Community Details regarding maintainers, community channels (e.g., Discord, Slack), or a public roadmap are not provided in the README.

Licensing & Compatibility The specific open-source license governing this project is not explicitly stated in the provided README. Clarification is needed regarding commercial use or closed-source linking compatibility.

Limitations & Caveats

  • MLX-only implementations are noted for high power consumption.
  • CoreML model generation necessitates cloning an external repository (whisper.cpp) and executing specific scripts.
  • Accurate power consumption benchmarking on Apple Silicon remains an area for community contribution.
Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
31 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
1 more.

delayed-streams-modeling by kyutai-labs

0.6%
3k
Streaming multimodal sequence-to-sequence learning
Created 6 months ago
Updated 1 month ago
Feedback? Help us improve.