Python package for audio segmentation and utterance alignment
Top 82.6% on sourcepulse
This Python package provides CTC segmentation for aligning audio files with text, enabling utterance-level segmentation and timestamp extraction. It is designed for researchers and developers working with large audio datasets and end-to-end ASR systems.
How It Works
The core of the package involves a three-step process: 1. Forward Propagation: Character probabilities from a CTC-based neural network are used to build a trellis diagram, with zero transition costs for start-of-sentence or blank tokens to handle preamble segments. 2. Backtracking: A most probable path of characters is determined through all time steps, starting from the highest probability for the last character. 3. Confidence Score: A confidence score for each utterance is derived from character alignment probabilities, aiding in the detection and filtering of low-quality segments.
Quick Start & Requirements
pip install ctc-segmentation
wav2vec2-large-xlsr-53-english
is provided in the README.Highlighted Details
min_window_size
and blank_transition_cost_zero
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive