whisper.cpp by ggml-org

C/C++ port for high-performance Whisper ASR inference

Created 3 years ago

45,628 stars

Top 0.6% on SourcePulse

View on GitHub

22 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Mckay Wrigley

Founder of Takeoff AI

Benjamin Bolte

Cofounder of K-Scale Labs

Vincent Weisser

Cofounder of Prime Intellect

and 18 more!

Project Summary

This project provides a high-performance C/C++ implementation of OpenAI's Whisper automatic speech recognition (ASR) model, optimized for various hardware including Apple Silicon, x86, POWER, NVIDIA GPUs, and Intel/Ascend NPUs. It targets developers and researchers needing efficient, on-device speech-to-text capabilities across diverse platforms, offering significant speedups and reduced resource usage through techniques like quantization and mixed-precision inference.

How It Works

The core of the project is built upon the ggml machine learning library, enabling a lightweight, dependency-free C/C++ implementation of the Whisper model. This design facilitates easy integration into various applications and platforms. It leverages hardware-specific optimizations such as ARM NEON, Accelerate framework, Metal, and Core ML for Apple Silicon, and AVX/VSX intrinsics for x86/POWER architectures. The implementation supports mixed F16/F32 precision and integer quantization, minimizing memory allocations and improving inference speed.

Quick Start & Requirements

Install/Run: Clone the repository, download a ggml-formatted Whisper model (e.g., sh ./models/download-ggml-model.sh base.en), build the whisper-cli example (cmake -B build && cmake --build build --config Release), and transcribe an audio file (./build/bin/whisper-cli -f samples/jfk.wav).
Prerequisites: C++ compiler, CMake, FFmpeg (for non-WAV formats). Optional: CUDA for NVIDIA GPUs, OpenVINO for Intel hardware, Core ML for Apple Neural Engine, etc.
Setup Time: Building and downloading a base model is typically under 5 minutes.
Links: Official Docs, Models

Highlighted Details

Supports a wide range of hardware accelerators: Apple Silicon (Metal, Core ML), NVIDIA (cuBLAS), Intel (OpenVINO), Ascend NPU, Moore Threads GPUs (MUSA), Vulkan.
Offers integer quantization (e.g., Q5_0) for reduced memory footprint and faster inference.
Provides experimental features like word-level timestamps, speaker segmentation (via tinydiarize), and karaoke-style video generation.
Includes bindings for Rust, JavaScript, Go, Java, Ruby, .NET, Python, R, and Unity.
Offers a precompiled XCFramework for easy integration into Swift projects.

Maintenance & Community

The project is actively maintained by Georgi Gerganov and the ggml-org community. Discussions are encouraged for feedback and sharing projects.

Licensing & Compatibility

The project is released under the MIT License, allowing for commercial use and integration into closed-source applications.

Limitations & Caveats

The whisper-cli example currently requires 16-bit WAV files; other formats need conversion via FFmpeg. Real-time streaming requires SDL2. Some advanced features like speaker segmentation and karaoke generation are experimental.

Health Check

Last Commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

639 stars in the last 30 days