CAT  by thu-spmi

ASR toolkit for data-efficient end-to-end speech recognition

created 5 years ago
337 stars

Top 82.8% on sourcepulse

GitHubView on GitHub
Project Summary

CAT is a toolkit for data-efficient end-to-end Automatic Speech Recognition (ASR), targeting researchers and practitioners seeking to combine the benefits of hybrid and end-to-end ASR approaches. It offers a complete workflow for Conditional Random Field (CRF)-based ASR, aiming for improved performance with less data.

How It Works

CAT utilizes a CRF-based framework with a Connectionist Temporal Classification (CTC) inspired state topology. This approach combines global normalization modeling and discriminative training, bridging the gap between modular hybrid systems and unified end-to-end neural networks. The advantage lies in achieving data efficiency and potentially lower latency by judiciously balancing modularity and joint optimization.

Quick Start & Requirements

  • Install via git clone https://github.com/thu-spmi/CAT.git && cd CAT followed by ./install.sh.
  • Dependencies: PyTorch >= 1.9.0, CUDA-compatible device, NVIDIA driver, CUDA lib. Kaldi is optional but recommended for CTC-CRF training and data preparation. Torchaudio can be used as an alternative for feature extraction.
  • Further guidance is available in the TEMPLATE and data.sh files.

Highlighted Details

  • Full-fledged CUDA/C/C++ implementation of CTC-CRF loss function binding to PyTorch.
  • Supports one-stop training and inference for CTC, CTC-CRF, RNN-T, and LM.
  • Flexible configuration via JSON files.
  • Scalable and extensible for large datasets and custom models.
  • Achieves competitive performance on various benchmarks (e.g., 2.77% WER on WSJ eval92).

Maintenance & Community

The project is associated with the Speech Processing and Machine Intelligence (SPMI) group at Tsinghua University. Key publications are cited, indicating academic backing.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The toolkit requires a CUDA-enabled NVIDIA GPU. While Kaldi is optional, its absence might limit certain advanced CTC-CRF training functionalities. The licensing status requires clarification for commercial applications.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.