openai-whisper-cpu  by MiscellaneousStuff

Accelerating speech recognition on consumer CPUs

Created 3 years ago
257 stars

Top 98.3% on SourcePulse

GitHubView on GitHub
Project Summary

This project offers experimental modifications to OpenAI's Whisper ASR model, applying dynamic quantization to enhance inference speed and throughput on CPU-only hardware. It targets users with consumer-grade laptops or desktops lacking dedicated GPUs, enabling faster transcription by making larger Whisper models more efficient.

How It Works

The approach modifies Whisper's Linear() layers to torch.nn.Linear() and applies dynamic quantization (torch.qint8) via torch.quantization.quantize_dynamic. This reduces model precision, decreasing computational load and memory bandwidth for CPU inference.

Quick Start & Requirements

  • Primary install / run command:
    • Initialize and update submodules: git submodule init && git submodule update
    • Install: pip install -e ./whisper
  • Non-default prerequisites and dependencies: Python, PyTorch. Designed for CPU inference.
  • Estimated setup time or resource footprint: Not specified.
  • Links to official quick-start, docs, demo, or other relevant pages: None provided.

Highlighted Details

  • Quantization yields significant CPU speedups for larger Whisper models: 1.62x (base), 2.76x (small), 2.62x (medium) over non-quantized CPU versions.
  • The tiny quantized model exhibits a 0.74x slowdown vs. original CPU fp32, indicating performance varies by model size.
  • Achieves real-time transcription: 9.67x (tiny), 9.37x (base), 4.34x (small), 1.29x (medium).

Maintenance & Community

No details on maintainers, community channels, or roadmap are available.

Licensing & Compatibility

License is unspecified, posing a potential blocker for commercial use or integration.

Limitations & Caveats

Experimental nature; quantization can degrade performance for smaller models (e.g., tiny is slower than original CPU fp32). Unspecified license is a significant adoption blocker.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Zack Li Zack Li(Cofounder of Nexa AI), and
4 more.

smoothquant by mit-han-lab

0.4%
2k
Post-training quantization research paper for large language models
Created 3 years ago
Updated 1 year ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

llm-awq by mit-han-lab

0.1%
3k
Weight quantization research paper for LLM compression/acceleration
Created 2 years ago
Updated 5 months ago
Feedback? Help us improve.