Speech-to-Text-Russian  by SergeyShk

Speech-to-Text tool for Russian using pykaldi

created 5 years ago
332 stars

Top 83.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a speech-to-text system specifically for the Russian language, built upon the pykaldi toolkit. It's designed for researchers and developers needing to process Russian audio data, offering tools for segmentation, recognition, and result parsing, along with a web demo and Jupyter notebooks for ease of use.

How It Works

The system leverages Kaldi's robust speech recognition framework via the pykaldi Python bindings. It utilizes a pre-trained acoustic and language model from alphacep for Russian. The core functionality involves speech segmentation to isolate spoken segments, followed by recognition using Kaldi's decoding graph (HCLG.fst) and i-vector extraction for speaker normalization.

Quick Start & Requirements

  • Installation: Requires manual installation of Kaldi (Linux), Python libraries (pip install -r requirements.txt), and pykaldi (conda recommended for GPU support). Kaldi's binary paths must be added to the system's PATH.
  • Prerequisites: Linux OS, Kaldi installation, Python 3.x, Git LFS for the HCLG.fst model file (over 500MB).
  • Docker: Pre-built Docker image available (ghcr.io/sergeyshk/stt-ru:0.2.0) or build from source.
  • Resources: Significant disk space for Kaldi and the model files. GPU recommended for performance.
  • Docs: Kaldi Setup, pykaldi, Project Repo

Highlighted Details

  • Includes a web demo and Jupyter notebooks for examples.
  • Supports directory monitoring for continuous processing.
  • Offers command-line options for detailed control over recognition parameters.
  • Uses a pre-trained Russian model from alphacep.

Maintenance & Community

No specific community channels (Discord/Slack) or roadmap are mentioned in the README. The project appears to be maintained by a single contributor, SergeyShk.

Licensing & Compatibility

The README does not explicitly state a license. The underlying Kaldi toolkit has a permissive Apache 2.0 license. The alphacep model license is not specified. Compatibility for commercial use is undetermined without a clear license.

Limitations & Caveats

The setup process is complex, requiring manual Kaldi installation and configuration. The large size of the HCLG.fst file necessitates Git LFS or manual download. The project's maintenance status and community support are not clearly indicated.

Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.