Discover and explore top open-source AI tools and projects—updated daily.
MiscellaneousStuffAccelerating speech recognition on consumer CPUs
Top 98.3% on SourcePulse
This project offers experimental modifications to OpenAI's Whisper ASR model, applying dynamic quantization to enhance inference speed and throughput on CPU-only hardware. It targets users with consumer-grade laptops or desktops lacking dedicated GPUs, enabling faster transcription by making larger Whisper models more efficient.
How It Works
The approach modifies Whisper's Linear() layers to torch.nn.Linear() and applies dynamic quantization (torch.qint8) via torch.quantization.quantize_dynamic. This reduces model precision, decreasing computational load and memory bandwidth for CPU inference.
Quick Start & Requirements
git submodule init && git submodule updatepip install -e ./whisperHighlighted Details
base), 2.76x (small), 2.62x (medium) over non-quantized CPU versions.tiny quantized model exhibits a 0.74x slowdown vs. original CPU fp32, indicating performance varies by model size.tiny), 9.37x (base), 4.34x (small), 1.29x (medium).Maintenance & Community
No details on maintainers, community channels, or roadmap are available.
Licensing & Compatibility
License is unspecified, posing a potential blocker for commercial use or integration.
Limitations & Caveats
Experimental nature; quantization can degrade performance for smaller models (e.g., tiny is slower than original CPU fp32). Unspecified license is a significant adoption blocker.
3 years ago
Inactive
microsoft
mit-han-lab
mit-han-lab
NVIDIA