faster-whisper by SYSTRAN

Faster Whisper reimplementation using CTranslate2

Created 2 years ago

20,281 stars

Top 2.2% on SourcePulse

View on GitHub

6 Experts Love This Project

Dan Guido

Cofounder of Trail of Bits

Luis Capelo

Cofounder of Lightning AI

Jeremy Howard

Cofounder of fast.ai

Tim J. Baek

Founder of Open WebUI

and 2 more!

Project Summary

This project provides a significantly faster and more memory-efficient implementation of OpenAI's Whisper speech-to-text model, leveraging the CTranslate2 inference engine. It targets developers and researchers needing high-throughput transcription, offering up to 4x speed improvements and reduced resource consumption, especially with 8-bit quantization.

How It Works

Faster-Whisper reimplements the Whisper architecture using CTranslate2, a specialized C++ inference engine optimized for Transformer models. This allows for efficient execution on both CPU and GPU, with particular benefits from 8-bit quantization, which drastically reduces memory usage and speeds up computation without significant accuracy loss.

Quick Start & Requirements

Install: pip install faster-whisper
Prerequisites:
- Python 3.9+
- GPU: CUDA 12, cuBLAS, cuDNN 9 (or specific older versions of CTranslate2 for compatibility). Installation via Docker or pip on Linux is supported.
- CPU: No specific hardware requirements beyond standard CPU.
Setup: Minimal setup for basic Python usage. GPU setup requires NVIDIA driver and library installation.
Docs: https://github.com/SYSTRAN/faster-whisper

Highlighted Details

Up to 4x faster than openai/whisper on GPU (FP16) and significantly faster on CPU (INT8).
Supports 8-bit quantization on CPU and GPU for reduced memory footprint (e.g., 2926MB VRAM for Large-v2 INT8 vs. 4708MB FP16).
Batch transcription support for increased throughput.
Integrates Silero VAD for optional silence filtering.
Provides word-level timestamps.

Maintenance & Community

Actively maintained by SYSTRAN.
Numerous community integrations listed, including OpenAI-compatible servers, diarization tools, and CLI clients.
https://github.com/SYSTRAN/faster-whisper

Licensing & Compatibility

MIT License. Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

GPU execution strictly requires CUDA 12 and cuDNN 9, with specific workarounds needed for older CUDA/cuDNN versions by downgrading ctranslate2.
CPU benchmarks are provided for specific hardware (Intel Core i7-12700K), performance may vary.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

887 stars in the last 30 days