pytorch-kaldi by mravanelli

Speech recognition toolkit bridging PyTorch and Kaldi

Created 7 years ago

2,392 stars

Top 19.0% on SourcePulse

View on GitHub

3 Experts Love This Project

Soumith Chintala

Coauthor of PyTorch

Tri Dao

Chief Scientist at Together AI

Luca Antiga

CTO of Lightning AI

Project Summary

This project provides a toolkit for developing state-of-the-art hybrid DNN/HMM speech recognition systems, integrating PyTorch for deep learning components and Kaldi for feature extraction, label computation, and decoding. It's designed for researchers and engineers working on Automatic Speech Recognition (ASR) to build flexible and efficient systems.

How It Works

The toolkit bridges PyTorch and Kaldi, leveraging PyTorch's flexibility for neural network development and Kaldi's efficiency in traditional speech processing tasks. It allows for easy integration of custom acoustic models and offers several pre-implemented neural network architectures (MLP, CNN, RNN, LSTM, GRU, Li-GRU, SincNet). The system is configured via INI files, enabling complex architectures that combine multiple features and label streams, and supports multi-GPU training and distributed computing.

Quick Start & Requirements

Installation: Clone the repository and install requirements using pip install -r requirements.txt.
Prerequisites:
- Kaldi toolkit (ensure binaries are in PATH).
- PyTorch (tested with versions 1.0 and 0.4; older versions may cause errors).
- Python (tested with 2.7 and 3.7; Anaconda recommended).
- NVIDIA GPU with CUDA (tested with 9.0, 9.1, 8.0) is recommended.
Setup: Follow the TIMIT or Librispeech tutorials for a guided setup.

Highlighted Details

Supports dynamic scheduling of batch size, learning rate, and dropout during training.
Enables joint training of speech enhancement and ASR systems.
Includes tutorials for TIMIT and Librispeech datasets, demonstrating various architectures and feature types (MFCC, fbank, fMLLR).
Achieved state-of-the-art results on TIMIT, with a PER of 13.8% using a combined feature architecture.

Maintenance & Community

The project encourages community contributions and feedback for future development, aiming to support a wider range of speech processing tasks. The README mentions a successor project, SpeechBrain, which is recommended for new development.

Licensing & Compatibility

Released under a Creative Commons Attribution 4.0 International license, allowing for copy, distribution, and modification for research, commercial, and non-commercial purposes, provided the original paper is cited.

Limitations & Caveats

The project is actively developing, with a successor project (SpeechBrain) recommended for new work. While it supports various models and features, the README indicates plans for further extensions to cover more speech-related tasks. Compatibility with very old PyTorch versions is not guaranteed.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days