espresso  by freewym

Fast end-to-end neural speech recognition

Created 6 years ago
943 stars

Top 38.8% on SourcePulse

GitHubView on GitHub
Project Summary

Espresso is a PyTorch-based toolkit for end-to-end neural automatic speech recognition (ASR). It offers modularity and extensibility, supporting distributed training and advanced decoding techniques like look-ahead word-based language model fusion. Espresso is designed for researchers and practitioners working with large-scale speech datasets, providing state-of-the-art recipes for WSJ, LibriSpeech, and Switchboard.

How It Works

Espresso is built upon PyTorch and fairseq, leveraging their robust deep learning and neural machine translation capabilities. It supports various architectures, including Conformer encoders and Transducer models, and integrates features like SpecAugment and on-the-fly feature extraction from raw waveforms. This modular design allows for easy experimentation with different components and training strategies, aiming for fast and efficient ASR model development.

Quick Start & Requirements

  • Installation: pip install --editable . (after cloning the repository).
  • Prerequisites: Python >= 3.8, PyTorch >= 1.10.0. For training: NVIDIA GPU and NCCL.
  • Additional Dependencies: kaldi_io, sentencepiece, soundfile. Kaldi is required for data preparation, feature extraction, and hybrid system decoding. PyChain and OpenFst are needed for LF-MMI training. NVIDIA's Apex library is recommended for faster training.
  • Setup: Requires manual configuration of paths for Kaldi and Python, and potentially compiling PyChain/OpenFst.

Highlighted Details

  • Supports Conformer, Transducer, and Transformer architectures.
  • Implements CTC model training and decoding.
  • Features on-the-fly feature extraction, reducing Kaldi dependency for some workflows.
  • Offers fast, parallelized decoder for language model fusion.

Maintenance & Community

Espresso was presented at the 2019 IEEE ASRU Workshop. The repository is hosted on GitHub. Specific community channels or active development status are not detailed in the README.

Licensing & Compatibility

Espresso is MIT-licensed, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The setup process involves several external dependencies (Kaldi, PyChain, OpenFst) and manual path configurations, which may be complex for users unfamiliar with these tools. While some recipes aim to reduce Kaldi dependency, it remains crucial for certain functionalities.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.