Speech recognition model based on an efficient Transformer architecture
Top 98.6% on sourcepulse
Squeezeformer offers an efficient Transformer architecture for Automatic Speech Recognition (ASR), targeting researchers and practitioners in speech processing. It aims to provide high accuracy with reduced computational cost compared to standard Transformer models.
How It Works
Squeezeformer introduces a novel "squeeze" operation within the self-attention mechanism. This operation effectively reduces the sequence length before applying attention, thereby decreasing the quadratic complexity of standard Transformers. This design choice allows for more efficient processing of long audio sequences while retaining performance.
Quick Start & Requirements
pip install -e '.[tf2.5]'
(CPU) or pip install -e '.[tf2.5-gpu]'
(GPU).install_ctc_decoders.sh
script.create_librispeech_trans_all.py
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project explicitly requires TensorFlow 2.5, which may be outdated. The README does not detail the training process, focusing primarily on testing and inference.
2 years ago
Inactive