attention-lvcsr by rizar

End-to-end attention-based speech recognition

Created 10 years ago

265 stars

Top 96.5% on SourcePulse

Project Summary

This project provides an end-to-end attention-based large vocabulary speech recognition system, serving as a reference implementation for associated research papers. It is targeted at researchers and practitioners in speech recognition who need to understand or reproduce the results of the cited work. The primary benefit is the availability of a working, albeit dated, implementation of an attention-based ASR system.

How It Works

The system utilizes an attention-based mechanism for speech recognition, allowing the model to focus on relevant parts of the input audio sequence when generating output tokens. This approach, detailed in the referenced papers, offers an alternative to traditional frame-synchronous models by directly mapping variable-length audio segments to variable-length text sequences.

Quick Start & Requirements

Installation: Requires compiling Kaldi with --shared and --use-cuda=no options, installing Python packages (pykwalify, toposort, pyyaml, numpy, pandas, pyfst), and installing kaldi-python via python setup.py install.
Prerequisites: Kaldi, OpenFst, Python packages listed above. The Wall Street Journal (WSJ) dataset (LDC93S6B and LDC94S13B) is required for replicating results.
Dependencies: The codebase includes custom modified subtrees of Theano, Blocks, and Fuel.
Setup: Users are directed to exp/wsj for specific instructions on replicating results with the WSJ dataset.

Highlighted Details

Reference implementation for "End-to-End Attention-based Large Vocabulary Speech Recognition" and "Task Loss Estimation for Sequence Prediction" papers.
Includes custom modified subtrees for Theano, Blocks, Fuel, picklable-itertools, and Blocks-extras.

Maintenance & Community

This codebase is no longer maintained and is based on outdated technologies (Theano, Blocks). Users are recommended to explore more modern implementations.

Licensing & Compatibility

License: MIT.
Compatibility: While the MIT license generally permits commercial use, the project's reliance on outdated technologies and lack of maintenance may pose compatibility challenges for modern systems.

Limitations & Caveats

The project explicitly states it is no longer maintained due to its reliance on outdated technologies like Theano and Blocks, recommending users seek more modern alternatives. This significantly limits its practical applicability for current ASR development.

attention-lvcsr by rizar

Explore Similar Projects

ocotillo by neonbjb

reverb by revdotcom

ctc-segmentation by lumaku

SenseVoice.cpp by lovemefan

CAT by thu-spmi

zamia-speech by gooofy

pyctcdecode by kensho-technologies

kospeech by sooftware

athena by athena-team

speech-language-processing by edobashira

icefall by k2-fsa

FunASR by modelscope