Distilled speech recognition model, a faster Whisper variant
Top 12.7% on sourcepulse
Distil-Whisper offers a distilled English-only speech recognition model that is significantly faster and smaller than the original Whisper, while maintaining comparable accuracy. It is designed for users needing efficient speech-to-text capabilities, from researchers to developers integrating ASR into applications.
How It Works
Distil-Whisper employs knowledge distillation, retaining Whisper's encoder but using only two decoder layers. This reduced architecture is trained to mimic Whisper's output on a large, diverse dataset of pseudo-labeled audio, minimizing KL divergence and cross-entropy loss. This approach yields a model that is 6x faster and 49% smaller with a negligible impact on Word Error Rate.
Quick Start & Requirements
pip install --upgrade transformers accelerate datasets[audio]
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
6 months ago
1 week