pytorch-kaldi  by mravanelli

Speech recognition toolkit bridging PyTorch and Kaldi

Created 7 years ago
2,391 stars

Top 19.2% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a toolkit for developing state-of-the-art hybrid DNN/HMM speech recognition systems, integrating PyTorch for deep learning components and Kaldi for feature extraction, label computation, and decoding. It's designed for researchers and engineers working on Automatic Speech Recognition (ASR) to build flexible and efficient systems.

How It Works

The toolkit bridges PyTorch and Kaldi, leveraging PyTorch's flexibility for neural network development and Kaldi's efficiency in traditional speech processing tasks. It allows for easy integration of custom acoustic models and offers several pre-implemented neural network architectures (MLP, CNN, RNN, LSTM, GRU, Li-GRU, SincNet). The system is configured via INI files, enabling complex architectures that combine multiple features and label streams, and supports multi-GPU training and distributed computing.

Quick Start & Requirements

  • Installation: Clone the repository and install requirements using pip install -r requirements.txt.
  • Prerequisites:
    • Kaldi toolkit (ensure binaries are in PATH).
    • PyTorch (tested with versions 1.0 and 0.4; older versions may cause errors).
    • Python (tested with 2.7 and 3.7; Anaconda recommended).
    • NVIDIA GPU with CUDA (tested with 9.0, 9.1, 8.0) is recommended.
  • Setup: Follow the TIMIT or Librispeech tutorials for a guided setup.

Highlighted Details

  • Supports dynamic scheduling of batch size, learning rate, and dropout during training.
  • Enables joint training of speech enhancement and ASR systems.
  • Includes tutorials for TIMIT and Librispeech datasets, demonstrating various architectures and feature types (MFCC, fbank, fMLLR).
  • Achieved state-of-the-art results on TIMIT, with a PER of 13.8% using a combined feature architecture.

Maintenance & Community

The project encourages community contributions and feedback for future development, aiming to support a wider range of speech processing tasks. The README mentions a successor project, SpeechBrain, which is recommended for new development.

Licensing & Compatibility

Released under a Creative Commons Attribution 4.0 International license, allowing for copy, distribution, and modification for research, commercial, and non-commercial purposes, provided the original paper is cited.

Limitations & Caveats

The project is actively developing, with a successor project (SpeechBrain) recommended for new work. While it supports various models and features, the README indicates plans for further extensions to cover more speech-related tasks. Compatibility with very old PyTorch versions is not guaranteed.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.