speech-to-text-benchmark  by Picovoice

STT benchmark framework for comparing speech-to-text engines

created 7 years ago
657 stars

Top 51.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a minimalist and extensible framework for benchmarking various speech-to-text (STT) engines. It is designed for researchers and developers who need to compare the performance of different STT solutions across multiple datasets and languages, offering metrics like Word Error Rate (WER), Core-Hour, and Model Size.

How It Works

The framework orchestrates the process of transcribing audio files using different STT engines and then evaluates the results against reference transcripts. It supports several popular STT services (Amazon Transcribe, Azure Speech-to-Text, Google Speech-to-Text, IBM Watson) and open-source models (OpenAI Whisper, Picovoice Cheetah/Leopard). The evaluation is based on standard metrics, providing a quantitative comparison of accuracy and computational efficiency.

Quick Start & Requirements

  • Install FFmpeg.
  • Install Python dependencies: pip3 install -r requirements.txt.
  • Tested on Ubuntu 22.04.
  • Requires API keys/credentials for cloud-based engines and potentially model files for local engines.

Highlighted Details

  • Benchmarks WER for English, French, German, Italian, Spanish, and Portuguese.
  • Includes "Core-Hour" and "Model Size" metrics for offline engines, with detailed performance data on a specific AMD CPU configuration.
  • Offers direct comparison tables for WER across multiple datasets and languages.
  • Supports various Whisper model sizes and Picovoice engines.

Maintenance & Community

  • Developed by Picovoice.
  • No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The repository itself appears to be under a permissive license, but the underlying STT engines have their own licensing terms and usage costs.
  • Commercial use depends on the licensing of the individual STT services and models being benchmarked.

Limitations & Caveats

  • Benchmarking results are specific to the hardware and configuration used for testing.
  • Cloud-based engines are excluded from Core-Hour and Model Size metrics.
  • Some engines (e.g., Whisper Large) were omitted from specific benchmark tests.
Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
19 more.

whisper by openai

0.4%
86k
Speech recognition model for multilingual transcription/translation
created 2 years ago
updated 1 month ago
Feedback? Help us improve.