faster-whisper  by SYSTRAN

Faster Whisper reimplementation using CTranslate2

created 2 years ago
17,324 stars

Top 2.7% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a significantly faster and more memory-efficient implementation of OpenAI's Whisper speech-to-text model, leveraging the CTranslate2 inference engine. It targets developers and researchers needing high-throughput transcription, offering up to 4x speed improvements and reduced resource consumption, especially with 8-bit quantization.

How It Works

Faster-Whisper reimplements the Whisper architecture using CTranslate2, a specialized C++ inference engine optimized for Transformer models. This allows for efficient execution on both CPU and GPU, with particular benefits from 8-bit quantization, which drastically reduces memory usage and speeds up computation without significant accuracy loss.

Quick Start & Requirements

  • Install: pip install faster-whisper
  • Prerequisites:
    • Python 3.9+
    • GPU: CUDA 12, cuBLAS, cuDNN 9 (or specific older versions of CTranslate2 for compatibility). Installation via Docker or pip on Linux is supported.
    • CPU: No specific hardware requirements beyond standard CPU.
  • Setup: Minimal setup for basic Python usage. GPU setup requires NVIDIA driver and library installation.
  • Docs: https://github.com/SYSTRAN/faster-whisper

Highlighted Details

  • Up to 4x faster than openai/whisper on GPU (FP16) and significantly faster on CPU (INT8).
  • Supports 8-bit quantization on CPU and GPU for reduced memory footprint (e.g., 2926MB VRAM for Large-v2 INT8 vs. 4708MB FP16).
  • Batch transcription support for increased throughput.
  • Integrates Silero VAD for optional silence filtering.
  • Provides word-level timestamps.

Maintenance & Community

Licensing & Compatibility

  • MIT License. Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

  • GPU execution strictly requires CUDA 12 and cuDNN 9, with specific workarounds needed for older CUDA/cuDNN versions by downgrading ctranslate2.
  • CPU benchmarks are provided for specific hardware (Intel Core i7-12700K), performance may vary.
Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
8
Star History
1,665 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 14 hours ago
Feedback? Help us improve.