Squeezeformer by kssteven418

Speech recognition model based on an efficient Transformer architecture

Created 3 years ago

263 stars

Top 97.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

Squeezeformer offers an efficient Transformer architecture for Automatic Speech Recognition (ASR), targeting researchers and practitioners in speech processing. It aims to provide high accuracy with reduced computational cost compared to standard Transformer models.

How It Works

Squeezeformer introduces a novel "squeeze" operation within the self-attention mechanism. This operation effectively reduces the sequence length before applying attention, thereby decreasing the quadratic complexity of standard Transformers. This design choice allows for more efficient processing of long audio sequences while retaining performance.

Quick Start & Requirements

Install: pip install -e '.[tf2.5]' (CPU) or pip install -e '.[tf2.5-gpu]' (GPU).
Prerequisites: TensorFlow 2.5, Python 3.8. Requires install_ctc_decoders.sh script.
Dataset: Librispeech dataset (960hr) is required, with manifest files to be created using create_librispeech_trans_all.py.
Links: Paper, NeMo Support, Checkpoints

Highlighted Details

Achieves low Word Error Rates (WER) on Librispeech: Squeezeformer-L reports 2.47% (test-clean) and 5.97% (test-other).
Pre-trained checkpoints are available for various model sizes (XS to L).
Supports testing on different Librispeech subsets (dev-clean, dev-other, test-clean, test-other).

Maintenance & Community

The project is associated with NeurIPS'22.
External implementations are available in PyTorch and WeNet.

Licensing & Compatibility

The software and data were deposited in the BAIR Open Research Commons Repository on 02/07/23. Specific license details are not explicitly stated in the README, but the repository structure suggests a permissive license.

Limitations & Caveats

The project explicitly requires TensorFlow 2.5, which may be outdated. The README does not detail the training process, focusing primarily on testing and inference.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days