pyctcdecode  by kensho-technologies

CTC beam search decoder for speech recognition

Created 4 years ago
460 stars

Top 65.8% on SourcePulse

GitHubView on GitHub
Project Summary

This library provides a fast, Python-based CTC beam search decoder for speech recognition, targeting researchers and developers working with models like Nvidia's Conformer-CTC or Facebook's Wav2Vec2. It offers advanced features such as BPE vocabulary support, hotword boosting, and real-time decoding, aiming to match C++ implementation performance.

How It Works

The decoder implements CTC beam search in Python, leveraging optimizations like caching and beam pruning to achieve performance competitive with C++ implementations. It supports n-gram language models (e.g., KenLM) and integrates features like byte pair encoding (BPE) vocabulary handling and stateful language model decoding for real-time applications. This Python-centric approach facilitates rapid prototyping and experimentation with new features.

Quick Start & Requirements

Highlighted Details

  • Hotword boosting for domain-specific accuracy.
  • Multi-LM support for combining multiple language models.
  • Native frame index annotation for word timing and confidence scores.
  • Batch decoding support via multiprocessing.

Maintenance & Community

  • Developed by Kensho Technologies, LLC.
  • No explicit community links (Discord/Slack) or roadmap mentioned in the README.

Licensing & Compatibility

  • Licensed under the Apache 2.0 License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The README notes that default hyperparameter values are tuned for a specific use case, recommending users perform their own optimization for best results, especially with non-English languages.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.