Discover and explore top open-source AI tools and projects—updated daily.
End-to-end ASR for Mandarin Chinese
Top 43.9% on SourcePulse
This project provides a PyTorch implementation of the Speech Transformer, an end-to-end automatic speech recognition (ASR) system. It directly converts acoustic features into character sequences using a single neural network, targeting researchers and developers working with Mandarin Chinese ASR. The primary benefit is a streamlined, single-network approach to ASR.
How It Works
The Speech Transformer utilizes a Transformer network architecture, a departure from traditional recurrent neural network (RNN) based ASR models. This allows for parallel processing of input sequences, potentially leading to faster training and inference. The model directly maps acoustic features to character sequences, simplifying the ASR pipeline by eliminating the need for separate acoustic, pronunciation, and language models.
Quick Start & Requirements
pip install -r requirements.txt
, then cd tools; make KALDI=/path/to/kaldi
.egs/aishell/
and run bash run.sh
. Parameters can be modified via command-line arguments (e.g., --stage 3
, --batch_size <lower-value>
).visdom
by running visdom
in one terminal and bash run.sh --visdom 1 --visdom_id "<any-string>"
in another. Access via http://<your-remote-server-ip>:8097
.bash run.sh --continue_from <model-path>
.batch_size
.Highlighted Details
visdom
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 years ago
Inactive