Speech-to-Text-Russian by SergeyShk

Speech-to-Text tool for Russian using pykaldi

Created 6 years ago

342 stars

Top 81.1% on SourcePulse

Project Summary

This project provides a speech-to-text system specifically for the Russian language, built upon the pykaldi toolkit. It's designed for researchers and developers needing to process Russian audio data, offering tools for segmentation, recognition, and result parsing, along with a web demo and Jupyter notebooks for ease of use.

How It Works

The system leverages Kaldi's robust speech recognition framework via the pykaldi Python bindings. It utilizes a pre-trained acoustic and language model from alphacep for Russian. The core functionality involves speech segmentation to isolate spoken segments, followed by recognition using Kaldi's decoding graph (HCLG.fst) and i-vector extraction for speaker normalization.

Quick Start & Requirements

Installation: Requires manual installation of Kaldi (Linux), Python libraries (pip install -r requirements.txt), and pykaldi (conda recommended for GPU support). Kaldi's binary paths must be added to the system's PATH.
Prerequisites: Linux OS, Kaldi installation, Python 3.x, Git LFS for the HCLG.fst model file (over 500MB).
Docker: Pre-built Docker image available (ghcr.io/sergeyshk/stt-ru:0.2.0) or build from source.
Resources: Significant disk space for Kaldi and the model files. GPU recommended for performance.
Docs: Kaldi Setup, pykaldi, Project Repo

Highlighted Details

Includes a web demo and Jupyter notebooks for examples.
Supports directory monitoring for continuous processing.
Offers command-line options for detailed control over recognition parameters.
Uses a pre-trained Russian model from alphacep.

Maintenance & Community

No specific community channels (Discord/Slack) or roadmap are mentioned in the README. The project appears to be maintained by a single contributor, SergeyShk.

Licensing & Compatibility

The README does not explicitly state a license. The underlying Kaldi toolkit has a permissive Apache 2.0 license. The alphacep model license is not specified. Compatibility for commercial use is undetermined without a clear license.

Limitations & Caveats

The setup process is complex, requiring manual Kaldi installation and configuration. The large size of the HCLG.fst file necessitates Git LFS or manual download. The project's maintenance status and community support are not clearly indicated.

Speech-to-Text-Russian by SergeyShk

Explore Similar Projects

praises by ElmTran

speech-recognition-uk by egorsmkv

reverb by revdotcom

LLaSM by LinkSoul-AI

chatgpt-conversation by platelminto

zamia-speech by gooofy

forced-alignment-tools by pettarin

ichigo by janhq

whisper-plus by kadirnar

pocketsphinx.js by syl22-00

sherpa-onnx by k2-fsa

speech_recognition by Uberi