stable-ts by jianfch

SDK for enhanced audio transcription using OpenAI's Whisper

Created 3 years ago

2,128 stars

Top 20.8% on SourcePulse

1 Expert Loves This Project

jxnl

Author of Instructor

Project Summary

This library enhances OpenAI's Whisper for more accurate transcription timestamps and advanced audio processing. It's designed for researchers and developers needing precise control over ASR output, offering features like silence suppression, word-level alignment, and flexible output formatting.

How It Works

Stable-ts modifies Whisper's decoding process to improve timestamp reliability. It incorporates advanced post-processing techniques, including Voice Activity Detection (VAD) and custom regrouping algorithms, to refine segment boundaries and word timings. The library also supports various audio preprocessing steps like noise removal and frequency filtering.

Quick Start & Requirements

Install: pip install -U stable-ts
Prerequisites: FFmpeg (in PATH), PyTorch (ensure GPU support is installed separately if needed).
Usage: stable-ts audio.mp3 -o audio.srt
Documentation: https://github.com/jianfch/stable-ts

Highlighted Details

Timestamp Refinement: Offers methods like refine() and adjust_gaps() for precise timestamp tuning.
Regrouping: Advanced algorithms to restructure segments based on punctuation, gaps, length, or duration.
Alignment: Align existing text to audio with align() and align_words().
Multi-Model Support: Integrates with Whisper, Faster-Whisper, Hugging Face Transformers, and MLX.

Maintenance & Community

The project is actively maintained by jianfch. Community support channels are not explicitly mentioned in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Compatible with commercial and closed-source applications.

Limitations & Caveats

Refinement operations (refine()) are significantly slower when used with Faster-Whisper models compared to standard Whisper models.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

1

Star History

33 stars in the last 30 days

Explore Similar Projects

whisper-at by YuanGongND

Joint audio tagging and speech recognition model

Created 2 years ago

Updated 1 year ago

LiveWhisper by Nikorasu

Live transcription tool using OpenAI's Whisper

Created 3 years ago

Updated 5 months ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

transcriber_app by davabase

Real-time speech-to-text transcription app

Created 3 years ago

Updated 3 years ago

speech-to-text by reriiasu

Real-time transcription tool using faster-whisper

Created 2 years ago

Updated 1 year ago

whisper-ctranslate2 by Softcatala

CLI tool for faster Whisper transcription/translation

Created 2 years ago

Updated 4 weeks ago

faster-whisper-GUI by CheshireCC

GUI for faster-whisper/whisperX transcription

Created 2 years ago

Updated 1 year ago

whisper_mic by mallorbc

Microphone interface for OpenAI's Whisper speech-to-text model

Created 3 years ago

Updated 1 year ago

Starred by

Pietro Schirano

Pietro Schirano(Founder of MagicPath),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

1 more.

whisper_real_time by davabase

Demo for real-time speech-to-text using OpenAI's Whisper

Created 3 years ago

Updated 9 months ago

Starred by

Tim J. Baek

Tim J. Baek(Founder of Open WebUI),

Gabriel Almeida

Gabriel Almeida(Cofounder of Langflow), and

2 more.

whisper-diarization by MahmoudAshraf97

ASR pipeline for speaker diarization

Created 3 years ago

Updated 1 month ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Matt Schrage

Matt Schrage(Cofounder of Fig).

WhisperLive by collabora

Real-time transcription app using OpenAI's Whisper

Created 2 years ago

Updated 3 months ago

whisper-asr-webservice by ahmetoner

ASR webservice API for speech recognition, translation, and language ID

Created 3 years ago

Updated 1 month ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Travis Fischer

Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

Speech-to-text library for realtime applications

Created 2 years ago

Updated 6 months ago

Feedback? Help us improve.