pyctcdecode by kensho-technologies

CTC beam search decoder for speech recognition

Created 4 years ago

467 stars

Top 65.1% on SourcePulse

View on GitHub

4 Experts Love This Project

Omar Sanseviero

DevRel at Google DeepMind

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

This library provides a fast, Python-based CTC beam search decoder for speech recognition, targeting researchers and developers working with models like Nvidia's Conformer-CTC or Facebook's Wav2Vec2. It offers advanced features such as BPE vocabulary support, hotword boosting, and real-time decoding, aiming to match C++ implementation performance.

How It Works

The decoder implements CTC beam search in Python, leveraging optimizations like caching and beam pruning to achieve performance competitive with C++ implementations. It supports n-gram language models (e.g., KenLM) and integrates features like byte pair encoding (BPE) vocabulary handling and stateful language model decoding for real-time applications. This Python-centric approach facilitates rapid prototyping and experimentation with new features.

Quick Start & Requirements

Install via pip: pip install pyctcdecode
Requires a KenLM model file (.arpa or .bin).
Supports BPE vocabularies automatically.
Examples provided for integration with Nvidia NeMo and Huggingface Wav2Vec2 models.
Tutorials available for detailed usage: https://github.com/kensho-technologies/pyctcdecode/tree/main/tutorials

Highlighted Details

Hotword boosting for domain-specific accuracy.
Multi-LM support for combining multiple language models.
Native frame index annotation for word timing and confidence scores.
Batch decoding support via multiprocessing.

Maintenance & Community

Developed by Kensho Technologies, LLC.
No explicit community links (Discord/Slack) or roadmap mentioned in the README.

Licensing & Compatibility

Licensed under the Apache 2.0 License.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The README notes that default hyperparameter values are tuned for a specific use case, recommending users perform their own optimization for best results, especially with non-English languages.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days