Speech-Transformer  by kaituoxu

End-to-end ASR for Mandarin Chinese

Created 6 years ago
803 stars

Top 43.9% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a PyTorch implementation of the Speech Transformer, an end-to-end automatic speech recognition (ASR) system. It directly converts acoustic features into character sequences using a single neural network, targeting researchers and developers working with Mandarin Chinese ASR. The primary benefit is a streamlined, single-network approach to ASR.

How It Works

The Speech Transformer utilizes a Transformer network architecture, a departure from traditional recurrent neural network (RNN) based ASR models. This allows for parallel processing of input sequences, potentially leading to faster training and inference. The model directly maps acoustic features to character sequences, simplifying the ASR pipeline by eliminating the need for separate acoustic, pronunciation, and language models.

Quick Start & Requirements

  • Install: pip install -r requirements.txt, then cd tools; make KALDI=/path/to/kaldi.
  • Prerequisites: Python 3 (Anaconda recommended), PyTorch 0.4.1+, Kaldi (for feature extraction).
  • Dataset: Requires the Aishell dataset.
  • Usage: Navigate to egs/aishell/ and run bash run.sh. Parameters can be modified via command-line arguments (e.g., --stage 3, --batch_size <lower-value>).
  • Visualization: Use visdom by running visdom in one terminal and bash run.sh --visdom 1 --visdom_id "<any-string>" in another. Access via http://<your-remote-server-ip>:8097.
  • Resuming Training: Use bash run.sh --continue_from <model-path>.
  • Out of Memory: Reduce batch_size.

Highlighted Details

  • Achieves a Character Error Rate (CER) of 12.8% on the Aishell dataset with the SpeechTransformer model.
  • Provides a clear workflow breakdown: Data Preparation, Feature Generation, Dictionary and Json Data Preparation, Network Training, and Decoding.
  • Includes options for visualizing training loss with visdom.

Maintenance & Community

  • The project references a specific ICASSP 2019 paper, indicating a research-oriented origin. No explicit community links (Discord, Slack) or recent activity indicators are present in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. However, the inclusion of Kaldi, which is under the Apache 2.0 license, and the general nature of PyTorch projects suggest a permissive license, but this should be verified.

Limitations & Caveats

  • The implementation requires Kaldi for feature extraction, adding a dependency.
  • The PyTorch version requirement (0.4.1+) is relatively old, which might pose compatibility issues with newer PyTorch features or other libraries.
  • The provided results are specific to the Aishell dataset and Mandarin Chinese; performance on other languages or datasets is not detailed.
Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.