TensorflowASR  by Z-yq

ASR toolkit for CPU/edge deployment, approaching GPU model performance

created 5 years ago
474 stars

Top 65.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a TensorFlow 2 implementation of an end-to-end Automatic Speech Recognition (ASR) system based on the Conformer architecture. It targets researchers and developers aiming for high-performance ASR on CPU, achieving real-time factor (RTF) below 0.1, and offers features like VAD, noise reduction, punctuation restoration, and TTS-based data augmentation for improved ASR performance.

How It Works

The system utilizes a CTC + translate (CTC-based) architecture with a Conformer backbone, known for its effectiveness in capturing both local and global dependencies in speech. It offers both online streaming and offline recognition modes, with streaming implemented via Block Conformer (with Global CTC) for short utterances and Chunk Conformer (with CTC Picker) for longer ones. The project also includes a Mel Layer implementation in TensorFlow 2, mirroring librosa's functionality, and supports ONNX for C++ and Python inference.

Quick Start & Requirements

  • Install: pip install tensorflow-gpu==2.8+ (or compatible TF2 version), librosa, pypinyin, keras-bert, tensorflow-addons, tqdm, tf2onnx, onnxruntime (or onnxruntime-gpu).
  • Prerequisites: Python 3.6+, TensorFlow 2.8+.
  • Usage: Configure am_data.yml and model YAML files, then run python train_asr.py for training or python test_asr.py for testing. Pre-trained models are available via Baidu Netdisk links.
  • Docs: V1 Version

Highlighted Details

  • Achieves CPU RTF < 0.1 with Conformer models.
  • Implements VAD, noise reduction, punctuation restoration, and TTS data augmentation.
  • Supports both online streaming (Block/Chunk Conformer) and offline ASR.
  • Offers ONNX-based C++ and Python inference.
  • Benchmarked against other popular ASR toolkits like Wenet and FunASR on Aishell-1.

Maintenance & Community

  • Active development with recent updates to Chunk Conformer structure.
  • Community forum available; contact author for invite if full.
  • Other related projects include TensorflowTTS and an NLU BOT.

Licensing & Compatibility

  • Primarily licensed under Apache 2.0, allowing unrestricted commercial and non-commercial use.
  • Explicitly prohibits trading the project as a commodity.

Limitations & Caveats

The project is primarily focused on Chinese ASR and TTS data augmentation. The TTS augmentation requires manual removal of punctuation from input text and specific model downloads. The community group has a capacity limit.

Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.