kaldi-active-grammar  by daanzu

Python package for Kaldi speech recognition with dynamic grammars

created 6 years ago
342 stars

Top 81.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a Python package for dynamic, context-aware speech recognition using the Kaldi engine, enabling granular control over active grammars during decoding. It targets developers building command-and-control applications, offering improved accuracy by reducing the search space and allowing shared dictation models.

How It Works

Kaldi Active Grammar (KaldiAG) extends Kaldi's decoding graph capabilities by allowing multiple grammars to be compiled separately and dynamically activated or deactivated on a per-utterance basis. This approach contrasts with traditional monolithic Kaldi graphs, enabling more efficient and accurate recognition by only considering relevant grammars for the current context. It integrates with the Dragonfly speech recognition framework, facilitating the definition of grammars and associated actions.

Quick Start & Requirements

  • Installation: pip install 'dragonfly2[kaldi]' (recommended for Dragonfly integration) or pip install kaldi-active-grammar (for direct use).
  • Prerequisites: Python 3.6+ (64-bit), Windows/Linux/macOS. Requires a compatible Kaldi nnet3 chain model (provided in releases) and ~1GB+ disk space/RAM. For pronunciation generation, pip install 'kaldi-active-grammar[g2p_en]' or pip install 'kaldi-active-grammar[online]'. Windows users may need to install the VC2017+ redistributable.
  • Resources: A self-contained Windows distribution (kaldi-dragonfly-winpython) is available for quick setup.
  • Documentation: Example usage is available in the README, with a demo video linked.

Highlighted Details

  • Includes pre-compiled Kaldi binaries for Windows, Linux, and macOS, simplifying setup.
  • Offers a compatible backend for the Dragonfly and Caster speech recognition frameworks.
  • Supports plain dictation with a pre-trained English model or custom Kaldi models.
  • Enables automatic pronunciation generation for unknown words via local or online services.

Maintenance & Community

The project is developed by David Zurow (@daanzu). Donations are appreciated to support development. Related repositories and a Docker image for Linux are listed.

Licensing & Compatibility

Licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). This is a strong copyleft license that may have implications for commercial or closed-source use. The project incorporates code from Kaldi ASR (Apache-2.0) and OpenFST (Apache-2.0).

Limitations & Caveats

The formal documentation is currently limited, with example usage primarily found within the README. The project relies on a specific fork of Kaldi, not intended for standalone use. Conversion of standard Kaldi models is not yet fully implemented.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.