ASR toolkit for CPU/edge deployment, approaching GPU model performance
Top 65.3% on sourcepulse
This project provides a TensorFlow 2 implementation of an end-to-end Automatic Speech Recognition (ASR) system based on the Conformer architecture. It targets researchers and developers aiming for high-performance ASR on CPU, achieving real-time factor (RTF) below 0.1, and offers features like VAD, noise reduction, punctuation restoration, and TTS-based data augmentation for improved ASR performance.
How It Works
The system utilizes a CTC + translate (CTC-based) architecture with a Conformer backbone, known for its effectiveness in capturing both local and global dependencies in speech. It offers both online streaming and offline recognition modes, with streaming implemented via Block Conformer (with Global CTC) for short utterances and Chunk Conformer (with CTC Picker) for longer ones. The project also includes a Mel Layer implementation in TensorFlow 2, mirroring librosa's functionality, and supports ONNX for C++ and Python inference.
Quick Start & Requirements
pip install tensorflow-gpu==2.8+
(or compatible TF2 version), librosa
, pypinyin
, keras-bert
, tensorflow-addons
, tqdm
, tf2onnx
, onnxruntime
(or onnxruntime-gpu
).am_data.yml
and model YAML files, then run python train_asr.py
for training or python test_asr.py
for testing. Pre-trained models are available via Baidu Netdisk links.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is primarily focused on Chinese ASR and TTS data augmentation. The TTS augmentation requires manual removal of punctuation from input text and specific model downloads. The community group has a capacity limit.
4 months ago
Inactive