espresso by freewym

Fast end-to-end neural speech recognition

Created 7 years ago

942 stars

Top 38.8% on SourcePulse

View on GitHub

2 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Soumith Chintala

Coauthor of PyTorch

Project Summary

Espresso is a PyTorch-based toolkit for end-to-end neural automatic speech recognition (ASR). It offers modularity and extensibility, supporting distributed training and advanced decoding techniques like look-ahead word-based language model fusion. Espresso is designed for researchers and practitioners working with large-scale speech datasets, providing state-of-the-art recipes for WSJ, LibriSpeech, and Switchboard.

How It Works

Espresso is built upon PyTorch and fairseq, leveraging their robust deep learning and neural machine translation capabilities. It supports various architectures, including Conformer encoders and Transducer models, and integrates features like SpecAugment and on-the-fly feature extraction from raw waveforms. This modular design allows for easy experimentation with different components and training strategies, aiming for fast and efficient ASR model development.

Quick Start & Requirements

Installation: pip install --editable . (after cloning the repository).
Prerequisites: Python >= 3.8, PyTorch >= 1.10.0. For training: NVIDIA GPU and NCCL.
Additional Dependencies: kaldi_io, sentencepiece, soundfile. Kaldi is required for data preparation, feature extraction, and hybrid system decoding. PyChain and OpenFst are needed for LF-MMI training. NVIDIA's Apex library is recommended for faster training.
Setup: Requires manual configuration of paths for Kaldi and Python, and potentially compiling PyChain/OpenFst.

Highlighted Details

Supports Conformer, Transducer, and Transformer architectures.
Implements CTC model training and decoding.
Features on-the-fly feature extraction, reducing Kaldi dependency for some workflows.
Offers fast, parallelized decoder for language model fusion.

Maintenance & Community

Espresso was presented at the 2019 IEEE ASRU Workshop. The repository is hosted on GitHub. Specific community channels or active development status are not detailed in the README.

Licensing & Compatibility

Espresso is MIT-licensed, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The setup process involves several external dependencies (Kaldi, PyChain, OpenFst) and manual path configurations, which may be complex for users unfamiliar with these tools. While some recipes aim to reduce Kaldi dependency, it remains crucial for certain functionalities.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days