STT benchmark framework for comparing speech-to-text engines
Top 51.9% on sourcepulse
This repository provides a minimalist and extensible framework for benchmarking various speech-to-text (STT) engines. It is designed for researchers and developers who need to compare the performance of different STT solutions across multiple datasets and languages, offering metrics like Word Error Rate (WER), Core-Hour, and Model Size.
How It Works
The framework orchestrates the process of transcribing audio files using different STT engines and then evaluates the results against reference transcripts. It supports several popular STT services (Amazon Transcribe, Azure Speech-to-Text, Google Speech-to-Text, IBM Watson) and open-source models (OpenAI Whisper, Picovoice Cheetah/Leopard). The evaluation is based on standard metrics, providing a quantitative comparison of accuracy and computational efficiency.
Quick Start & Requirements
pip3 install -r requirements.txt
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 month ago
Inactive