SDK for enhanced audio transcription using OpenAI's Whisper
Top 22.9% on sourcepulse
This library enhances OpenAI's Whisper for more accurate transcription timestamps and advanced audio processing. It's designed for researchers and developers needing precise control over ASR output, offering features like silence suppression, word-level alignment, and flexible output formatting.
How It Works
Stable-ts modifies Whisper's decoding process to improve timestamp reliability. It incorporates advanced post-processing techniques, including Voice Activity Detection (VAD) and custom regrouping algorithms, to refine segment boundaries and word timings. The library also supports various audio preprocessing steps like noise removal and frequency filtering.
Quick Start & Requirements
pip install -U stable-ts
stable-ts audio.mp3 -o audio.srt
Highlighted Details
refine()
and adjust_gaps()
for precise timestamp tuning.align()
and align_words()
.Maintenance & Community
The project is actively maintained by jianfch. Community support channels are not explicitly mentioned in the README.
Licensing & Compatibility
Limitations & Caveats
Refinement operations (refine()
) are significantly slower when used with Faster-Whisper models compared to standard Whisper models.
2 months ago
1 day