TensorflowASR by Z-yq

ASR toolkit for CPU/edge deployment, approaching GPU model performance

Created 6 years ago

474 stars

Top 64.3% on SourcePulse

Project Summary

This project provides a TensorFlow 2 implementation of an end-to-end Automatic Speech Recognition (ASR) system based on the Conformer architecture. It targets researchers and developers aiming for high-performance ASR on CPU, achieving real-time factor (RTF) below 0.1, and offers features like VAD, noise reduction, punctuation restoration, and TTS-based data augmentation for improved ASR performance.

How It Works

The system utilizes a CTC + translate (CTC-based) architecture with a Conformer backbone, known for its effectiveness in capturing both local and global dependencies in speech. It offers both online streaming and offline recognition modes, with streaming implemented via Block Conformer (with Global CTC) for short utterances and Chunk Conformer (with CTC Picker) for longer ones. The project also includes a Mel Layer implementation in TensorFlow 2, mirroring librosa's functionality, and supports ONNX for C++ and Python inference.

Quick Start & Requirements

Install: pip install tensorflow-gpu==2.8+ (or compatible TF2 version), librosa, pypinyin, keras-bert, tensorflow-addons, tqdm, tf2onnx, onnxruntime (or onnxruntime-gpu).
Prerequisites: Python 3.6+, TensorFlow 2.8+.
Usage: Configure am_data.yml and model YAML files, then run python train_asr.py for training or python test_asr.py for testing. Pre-trained models are available via Baidu Netdisk links.
Docs: V1 Version

Highlighted Details

Achieves CPU RTF < 0.1 with Conformer models.
Implements VAD, noise reduction, punctuation restoration, and TTS data augmentation.
Supports both online streaming (Block/Chunk Conformer) and offline ASR.
Offers ONNX-based C++ and Python inference.
Benchmarked against other popular ASR toolkits like Wenet and FunASR on Aishell-1.

Maintenance & Community

Active development with recent updates to Chunk Conformer structure.
Community forum available; contact author for invite if full.
Other related projects include TensorflowTTS and an NLU BOT.

Licensing & Compatibility

Primarily licensed under Apache 2.0, allowing unrestricted commercial and non-commercial use.
Explicitly prohibits trading the project as a commodity.

Limitations & Caveats

The project is primarily focused on Chinese ASR and TTS data augmentation. The TTS augmentation requires manual removal of punctuation from input text and specific model downloads. The community group has a capacity limit.

TensorflowASR by Z-yq

Explore Similar Projects

Squeezeformer by kssteven418

edgedict by theblackcat102

deep-text-recognition-benchmark by roatienza

f5-tts-mlx by lucasnewman

SenseVoice.cpp by lovemefan

RapidASR by RapidAI

CAT by thu-spmi

moonshine by moonshine-ai

athena by athena-team

speech-to-text-wavenet by buriburisuri

PaddleSpeech by PaddlePaddle

espnet by espnet