Squeezeformer  by kssteven418

Speech recognition model based on an efficient Transformer architecture

created 3 years ago
258 stars

Top 98.6% on sourcepulse

GitHubView on GitHub
Project Summary

Squeezeformer offers an efficient Transformer architecture for Automatic Speech Recognition (ASR), targeting researchers and practitioners in speech processing. It aims to provide high accuracy with reduced computational cost compared to standard Transformer models.

How It Works

Squeezeformer introduces a novel "squeeze" operation within the self-attention mechanism. This operation effectively reduces the sequence length before applying attention, thereby decreasing the quadratic complexity of standard Transformers. This design choice allows for more efficient processing of long audio sequences while retaining performance.

Quick Start & Requirements

  • Install: pip install -e '.[tf2.5]' (CPU) or pip install -e '.[tf2.5-gpu]' (GPU).
  • Prerequisites: TensorFlow 2.5, Python 3.8. Requires install_ctc_decoders.sh script.
  • Dataset: Librispeech dataset (960hr) is required, with manifest files to be created using create_librispeech_trans_all.py.
  • Links: Paper, NeMo Support, Checkpoints

Highlighted Details

  • Achieves low Word Error Rates (WER) on Librispeech: Squeezeformer-L reports 2.47% (test-clean) and 5.97% (test-other).
  • Pre-trained checkpoints are available for various model sizes (XS to L).
  • Supports testing on different Librispeech subsets (dev-clean, dev-other, test-clean, test-other).

Maintenance & Community

  • The project is associated with NeurIPS'22.
  • External implementations are available in PyTorch and WeNet.

Licensing & Compatibility

  • The software and data were deposited in the BAIR Open Research Commons Repository on 02/07/23. Specific license details are not explicitly stated in the README, but the repository structure suggests a permissive license.

Limitations & Caveats

The project explicitly requires TensorFlow 2.5, which may be outdated. The README does not detail the training process, focusing primarily on testing and inference.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
6 more.

FasterTransformer by NVIDIA

0.2%
6k
Optimized transformer library for inference
created 4 years ago
updated 1 year ago
Feedback? Help us improve.