Discover and explore top open-source AI tools and projects—updated daily.
linto-aiASR tool for word-level timestamps and confidence scores using Whisper
Top 17.9% on SourcePulse
This project provides word-level timestamps and confidence scores for multilingual Automatic Speech Recognition (ASR) using OpenAI's Whisper models. It addresses the limitation of Whisper's segment-level timestamps, offering a more granular and accurate transcription for researchers and developers working with speech data.
How It Works
The core innovation lies in using Dynamic Time Warping (DTW) on Whisper's cross-attention weights to derive word-level alignments. This approach avoids the need for language-specific models or character normalization required by other methods, and it performs alignment on-the-fly without additional inference steps, optimizing memory usage for long audio files.
Quick Start & Requirements
pip3 install whisper-timestampedmatplotlib, torchaudio, onnxruntime for VAD, transformers for Hugging Face models.Highlighted Details
Maintenance & Community
openai-whisper (MIT) and dtw-python (GPL v3).Licensing & Compatibility
openai-whisper (MIT) and dtw-python (GPL v3). The GPL v3 license of dtw-python may impose copyleft restrictions on derivative works.Limitations & Caveats
1 month ago
Inactive
m-bain