kaldi by kaldi-asr

Speech recognition toolkit for Linux, macOS, Cygwin, and Windows

Created 11 years ago

15,325 stars

Top 3.2% on SourcePulse

View on GitHub

6 Experts Love This Project

Boris Cherny

Creator of Claude Code; MTS at Anthropic

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

and 2 more!

Project Summary

Kaldi is an open-source toolkit for speech recognition research and development, offering a comprehensive set of tools and recipes for building and deploying ASR systems. It is primarily targeted at researchers and developers in the speech technology domain, providing a flexible and powerful platform for experimentation and customization.

How It Works

Kaldi is built around a C++ core, emphasizing efficiency and performance. It employs a modular design, allowing users to easily integrate different acoustic models, language models, and decoding algorithms. The toolkit supports various speech recognition paradigms, including hybrid HMM-GMM and end-to-end deep neural network approaches, offering flexibility in model selection and training.

Quick Start & Requirements

Installation: Build instructions are provided in ./INSTALL for UNIX-like systems (Linux, macOS, Cygwin). Windows users should refer to windows/INSTALL.
Prerequisites: Requires a C++ compiler, make, cmake, and specific libraries like lapack-devel and openfst-devel (on Fedora). CUDA is supported for GPU acceleration.
Resources: Building from source can be time-consuming. Specific build commands and platform notes (Fedora, ppc64le, Android, Web Assembly) are available in the README.
Documentation: Project information, techniques, tutorials, and Doxygen references are available on the project site: http://kaldi-asr.org/.

Highlighted Details

Supports both traditional hybrid HMM-GMM and modern end-to-end deep neural network (DNN) acoustic modeling.
Includes a wide range of recipes for various speech recognition tasks and datasets.
Offers advanced features like lattice-free MMI, speaker adaptation, and online decoding.
Cross-compilation support for Android and Web Assembly enables deployment on diverse platforms.

Maintenance & Community

Active development community with user and developer mailing lists: http://kaldi-asr.org/forums.html.
Development follows a GitHub pull request model, adhering to the Google C++ Style Guide with minor exceptions.

Licensing & Compatibility

The toolkit is distributed under a permissive, BSD-style license, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

The build process can be complex and may require specific library versions or configurations, as indicated by platform-specific notes. Some users may encounter build issues, as suggested by the README's advice to contact developers if problems arise.

Health Check

Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

24 stars in the last 30 days