Speech-Transformer by kaituoxu

End-to-end ASR for Mandarin Chinese

Created 7 years ago

804 stars

Top 43.9% on SourcePulse

Project Summary

This project provides a PyTorch implementation of the Speech Transformer, an end-to-end automatic speech recognition (ASR) system. It directly converts acoustic features into character sequences using a single neural network, targeting researchers and developers working with Mandarin Chinese ASR. The primary benefit is a streamlined, single-network approach to ASR.

How It Works

The Speech Transformer utilizes a Transformer network architecture, a departure from traditional recurrent neural network (RNN) based ASR models. This allows for parallel processing of input sequences, potentially leading to faster training and inference. The model directly maps acoustic features to character sequences, simplifying the ASR pipeline by eliminating the need for separate acoustic, pronunciation, and language models.

Quick Start & Requirements

Install: pip install -r requirements.txt, then cd tools; make KALDI=/path/to/kaldi.
Prerequisites: Python 3 (Anaconda recommended), PyTorch 0.4.1+, Kaldi (for feature extraction).
Dataset: Requires the Aishell dataset.
Usage: Navigate to egs/aishell/ and run bash run.sh. Parameters can be modified via command-line arguments (e.g., --stage 3, --batch_size <lower-value>).
Visualization: Use visdom by running visdom in one terminal and bash run.sh --visdom 1 --visdom_id "<any-string>" in another. Access via http://<your-remote-server-ip>:8097.
Resuming Training: Use bash run.sh --continue_from <model-path>.
Out of Memory: Reduce batch_size.

Highlighted Details

Achieves a Character Error Rate (CER) of 12.8% on the Aishell dataset with the SpeechTransformer model.
Provides a clear workflow breakdown: Data Preparation, Feature Generation, Dictionary and Json Data Preparation, Network Training, and Decoding.
Includes options for visualizing training loss with visdom.

Maintenance & Community

The project references a specific ICASSP 2019 paper, indicating a research-oriented origin. No explicit community links (Discord, Slack) or recent activity indicators are present in the README.

Licensing & Compatibility

The README does not explicitly state a license. However, the inclusion of Kaldi, which is under the Apache 2.0 license, and the general nature of PyTorch projects suggest a permissive license, but this should be verified.

Limitations & Caveats

The implementation requires Kaldi for feature extraction, adding a dependency.
The PyTorch version requirement (0.4.1+) is relatively old, which might pose compatibility issues with newer PyTorch features or other libraries.
The provided results are specific to the Aishell dataset and Mandarin Chinese; performance on other languages or datasets is not detailed.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

PortaSpeech by keonlee9420

PyTorch for portable, high-quality generative TTS

Created 4 years ago

Updated 3 years ago

OLMoASR by allenai

Open-source speech recognition models

Created 1 year ago

Updated 6 days ago

WenetSpeech by wenet-e2e

Large-scale Chinese speech recognition dataset

Created 4 years ago

Updated 2 years ago

SenseVoice.cpp by lovemefan

C/C++ port of an audio foundation model

Created 1 year ago

Updated 1 month ago

awesome-kaldi by YoavRamon

List of Kaldi ASR resources

Created 6 years ago

Updated 3 years ago

pase by santi-pdp

Speech representation learning for diverse tasks

Created 7 years ago

Updated 2 years ago

neural_sp by hirofumi0810

End-to-end speech processing toolkit

Created 8 years ago

Updated 4 years ago

athena by athena-team

Open-source speech processing engine for industrial/academic use

Created 5 years ago

Updated 2 years ago

Starred by

Soumith Chintala

Soumith Chintala(Coauthor of PyTorch),

Tri Dao

Tri Dao(Chief Scientist at Together AI), and

1 more.

pytorch-kaldi by mravanelli

Speech recognition toolkit bridging PyTorch and Kaldi

Created 7 years ago

Updated 3 years ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen),

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI), and

1 more.

BELLE by LianjiaTech

Chinese LLM engine for democratized access and instruction tuning

Created 2 years ago

Updated 1 year ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Cristóbal Valenzuela

Cristóbal Valenzuela(Cofounder of Runway), and

8 more.

speech-to-text-wavenet by buriburisuri

Speech recognition using WaveNet in TensorFlow

Created 9 years ago

Updated 4 years ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Piotr Dąbkowski

Piotr Dąbkowski(Cofounder of ElevenLabs), and

2 more.

PaddleSpeech by PaddlePaddle

Speech toolkit for ASR, TTS, speaker verification, translation, and keyword spotting

Created 8 years ago

Updated 2 weeks ago

Feedback? Help us improve.