openai-whisper-cpu  by MiscellaneousStuff

Accelerating speech recognition on consumer CPUs

Created 3 years ago
259 stars

Top 97.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project offers experimental modifications to OpenAI's Whisper ASR model, applying dynamic quantization to enhance inference speed and throughput on CPU-only hardware. It targets users with consumer-grade laptops or desktops lacking dedicated GPUs, enabling faster transcription by making larger Whisper models more efficient.

How It Works

The approach modifies Whisper's Linear() layers to torch.nn.Linear() and applies dynamic quantization (torch.qint8) via torch.quantization.quantize_dynamic. This reduces model precision, decreasing computational load and memory bandwidth for CPU inference.

Quick Start & Requirements

  • Primary install / run command:
    • Initialize and update submodules: git submodule init && git submodule update
    • Install: pip install -e ./whisper
  • Non-default prerequisites and dependencies: Python, PyTorch. Designed for CPU inference.
  • Estimated setup time or resource footprint: Not specified.
  • Links to official quick-start, docs, demo, or other relevant pages: None provided.

Highlighted Details

  • Quantization yields significant CPU speedups for larger Whisper models: 1.62x (base), 2.76x (small), 2.62x (medium) over non-quantized CPU versions.
  • The tiny quantized model exhibits a 0.74x slowdown vs. original CPU fp32, indicating performance varies by model size.
  • Achieves real-time transcription: 9.67x (tiny), 9.37x (base), 4.34x (small), 1.29x (medium).

Maintenance & Community

No details on maintainers, community channels, or roadmap are available.

Licensing & Compatibility

License is unspecified, posing a potential blocker for commercial use or integration.

Limitations & Caveats

Experimental nature; quantization can degrade performance for smaller models (e.g., tiny is slower than original CPU fp32). Unspecified license is a significant adoption blocker.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
3 more.

voxtral.c by antirez

5.3%
1k
Pure C speech-to-text inference engine for Mistral Voxtral Realtime 4B
Created 2 weeks ago
Updated 1 week ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Zack Li Zack Li(Cofounder of Nexa AI), and
4 more.

smoothquant by mit-han-lab

0.2%
2k
Post-training quantization research paper for large language models
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.