pyctcdecode  by kensho-technologies

CTC beam search decoder for speech recognition

created 4 years ago
453 stars

Top 67.6% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides a fast, Python-based CTC beam search decoder for speech recognition, targeting researchers and developers working with models like Nvidia's Conformer-CTC or Facebook's Wav2Vec2. It offers advanced features such as BPE vocabulary support, hotword boosting, and real-time decoding, aiming to match C++ implementation performance.

How It Works

The decoder implements CTC beam search in Python, leveraging optimizations like caching and beam pruning to achieve performance competitive with C++ implementations. It supports n-gram language models (e.g., KenLM) and integrates features like byte pair encoding (BPE) vocabulary handling and stateful language model decoding for real-time applications. This Python-centric approach facilitates rapid prototyping and experimentation with new features.

Quick Start & Requirements

Highlighted Details

  • Hotword boosting for domain-specific accuracy.
  • Multi-LM support for combining multiple language models.
  • Native frame index annotation for word timing and confidence scores.
  • Batch decoding support via multiprocessing.

Maintenance & Community

  • Developed by Kensho Technologies, LLC.
  • No explicit community links (Discord/Slack) or roadmap mentioned in the README.

Licensing & Compatibility

  • Licensed under the Apache 2.0 License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The README notes that default hyperparameter values are tuned for a specific use case, recommending users perform their own optimization for best results, especially with non-English languages.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
19 more.

whisper by openai

0.4%
86k
Speech recognition model for multilingual transcription/translation
created 2 years ago
updated 1 month ago
Feedback? Help us improve.