whisper-timestamped  by linto-ai

ASR tool for word-level timestamps and confidence scores using Whisper

created 2 years ago
2,530 stars

Top 18.9% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides word-level timestamps and confidence scores for multilingual Automatic Speech Recognition (ASR) using OpenAI's Whisper models. It addresses the limitation of Whisper's segment-level timestamps, offering a more granular and accurate transcription for researchers and developers working with speech data.

How It Works

The core innovation lies in using Dynamic Time Warping (DTW) on Whisper's cross-attention weights to derive word-level alignments. This approach avoids the need for language-specific models or character normalization required by other methods, and it performs alignment on-the-fly without additional inference steps, optimizing memory usage for long audio files.

Quick Start & Requirements

  • Install: pip3 install whisper-timestamped
  • Prerequisites: Python >= 3.9, ffmpeg. Optional: matplotlib, torchaudio, onnxruntime for VAD, transformers for Hugging Face models.
  • Docker: Provided for CPU-only and full installations.
  • Docs: https://github.com/linto-ai/whisper-timestamped

Highlighted Details

  • Word-level timestamps and confidence scores.
  • Optional Voice Activity Detection (VAD) to prevent hallucinations.
  • Support for detecting and marking speech disfluencies.
  • Compatible with OpenAI Whisper and Hugging Face models.
  • Outputs include JSON, CSV, SRT, and VTT formats with word timestamps.

Maintenance & Community

  • Primarily developed by Jérôme Louradour.
  • Based on openai-whisper (MIT) and dtw-python (GPL v3).

Licensing & Compatibility

  • The project itself is not explicitly licensed in the README. However, it depends on openai-whisper (MIT) and dtw-python (GPL v3). The GPL v3 license of dtw-python may impose copyleft restrictions on derivative works.

Limitations & Caveats

  • The README states the extension is for "experimental purposes" and may "significantly impact performance."
  • The GPL v3 dependency might restrict commercial use or linking with closed-source applications.
Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
159 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.