GPGPU inference for OpenAI's Whisper ASR model
Top 5.3% on sourcepulse
This project provides a high-performance, vendor-agnostic GPGPU inference engine for OpenAI's Whisper ASR model, specifically targeting Windows users. It offers a significantly faster and more lightweight alternative to Python-based implementations for transcribing audio files or live microphone input.
How It Works
The core of the project is a C++ implementation leveraging DirectCompute (Direct3D 11 compute shaders) for GPU acceleration. This approach avoids heavy runtime dependencies like PyTorch and CUDA, resulting in a much smaller footprint. It utilizes mixed F16/F32 precision and includes a built-in profiler for shader execution times. Media Foundation is used for broad audio format and capture device support.
Quick Start & Requirements
WhisperDesktop.zip
from the Releases section and run WhisperDesktop.exe
.Highlighted Details
Maintenance & Community
This appears to be a personal hobby project, with no specific mention of contributors, sponsorships, or community channels like Discord/Slack.
Licensing & Compatibility
The project is licensed under the MIT License, permitting commercial use and closed-source linking.
Limitations & Caveats
Automatic language detection is not implemented. Real-time audio capture exhibits high latency (5-10 seconds) due to voice detection and audio chunking. Performance on discrete AMD or integrated Intel GPUs may not be fully optimized. The project is provided "as is" without warranty.
1 year ago
Inactive