whisper.cpp  by ggml-org

C/C++ port for high-performance Whisper ASR inference

Created 3 years ago
43,286 stars

Top 0.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a high-performance C/C++ implementation of OpenAI's Whisper automatic speech recognition (ASR) model, optimized for various hardware including Apple Silicon, x86, POWER, NVIDIA GPUs, and Intel/Ascend NPUs. It targets developers and researchers needing efficient, on-device speech-to-text capabilities across diverse platforms, offering significant speedups and reduced resource usage through techniques like quantization and mixed-precision inference.

How It Works

The core of the project is built upon the ggml machine learning library, enabling a lightweight, dependency-free C/C++ implementation of the Whisper model. This design facilitates easy integration into various applications and platforms. It leverages hardware-specific optimizations such as ARM NEON, Accelerate framework, Metal, and Core ML for Apple Silicon, and AVX/VSX intrinsics for x86/POWER architectures. The implementation supports mixed F16/F32 precision and integer quantization, minimizing memory allocations and improving inference speed.

Quick Start & Requirements

  • Install/Run: Clone the repository, download a ggml-formatted Whisper model (e.g., sh ./models/download-ggml-model.sh base.en), build the whisper-cli example (cmake -B build && cmake --build build --config Release), and transcribe an audio file (./build/bin/whisper-cli -f samples/jfk.wav).
  • Prerequisites: C++ compiler, CMake, FFmpeg (for non-WAV formats). Optional: CUDA for NVIDIA GPUs, OpenVINO for Intel hardware, Core ML for Apple Neural Engine, etc.
  • Setup Time: Building and downloading a base model is typically under 5 minutes.
  • Links: Official Docs, Models

Highlighted Details

  • Supports a wide range of hardware accelerators: Apple Silicon (Metal, Core ML), NVIDIA (cuBLAS), Intel (OpenVINO), Ascend NPU, Moore Threads GPUs (MUSA), Vulkan.
  • Offers integer quantization (e.g., Q5_0) for reduced memory footprint and faster inference.
  • Provides experimental features like word-level timestamps, speaker segmentation (via tinydiarize), and karaoke-style video generation.
  • Includes bindings for Rust, JavaScript, Go, Java, Ruby, .NET, Python, R, and Unity.
  • Offers a precompiled XCFramework for easy integration into Swift projects.

Maintenance & Community

The project is actively maintained by Georgi Gerganov and the ggml-org community. Discussions are encouraged for feedback and sharing projects.

Licensing & Compatibility

The project is released under the MIT License, allowing for commercial use and integration into closed-source applications.

Limitations & Caveats

The whisper-cli example currently requires 16-bit WAV files; other formats need conversion via FFmpeg. Real-time streaming requires SDL2. Some advanced features like speaker segmentation and karaoke generation are experimental.

Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
17
Issues (30d)
26
Star History
876 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

llm-awq by mit-han-lab

0.3%
3k
Weight quantization research paper for LLM compression/acceleration
Created 2 years ago
Updated 2 months ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

0.1%
6k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 2 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

gemma_pytorch by google

0.2%
6k
PyTorch implementation for Google's Gemma models
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.