C/C++ port for high-performance Whisper ASR inference
Top 0.6% on sourcepulse
This project provides a high-performance C/C++ implementation of OpenAI's Whisper automatic speech recognition (ASR) model, optimized for various hardware including Apple Silicon, x86, POWER, NVIDIA GPUs, and Intel/Ascend NPUs. It targets developers and researchers needing efficient, on-device speech-to-text capabilities across diverse platforms, offering significant speedups and reduced resource usage through techniques like quantization and mixed-precision inference.
How It Works
The core of the project is built upon the ggml
machine learning library, enabling a lightweight, dependency-free C/C++ implementation of the Whisper model. This design facilitates easy integration into various applications and platforms. It leverages hardware-specific optimizations such as ARM NEON, Accelerate framework, Metal, and Core ML for Apple Silicon, and AVX/VSX intrinsics for x86/POWER architectures. The implementation supports mixed F16/F32 precision and integer quantization, minimizing memory allocations and improving inference speed.
Quick Start & Requirements
sh ./models/download-ggml-model.sh base.en
), build the whisper-cli
example (cmake -B build && cmake --build build --config Release
), and transcribe an audio file (./build/bin/whisper-cli -f samples/jfk.wav
).Highlighted Details
tinydiarize
), and karaoke-style video generation.Maintenance & Community
The project is actively maintained by Georgi Gerganov and the ggml-org
community. Discussions are encouraged for feedback and sharing projects.
Licensing & Compatibility
The project is released under the MIT License, allowing for commercial use and integration into closed-source applications.
Limitations & Caveats
The whisper-cli
example currently requires 16-bit WAV files; other formats need conversion via FFmpeg. Real-time streaming requires SDL2. Some advanced features like speaker segmentation and karaoke generation are experimental.
18 hours ago
1 day