Discover and explore top open-source AI tools and projects—updated daily.
voicekit-teamStreaming ASR pipeline for Russian and English
Top 98.5% on SourcePulse
Summary
T-one is a high-performance, streaming CTC-based Automatic Speech Recognition (ASR) pipeline specifically engineered for the Russian language and the telephony domain. It offers a ready-to-use solution for real-time transcription, benefiting developers and researchers requiring low-latency, high-throughput speech-to-text capabilities.
How It Works
The pipeline employs a Conformer-based acoustic model that processes audio in 300 ms chunks, preserving acoustic context across segment boundaries via hidden states. A novel log-probability splitter identifies phrase boundaries by detecting speech and silence frames, outputting phrases with timestamps. Transcription is finalized using either greedy decoding or a KenLM-based CTC beam search decoder, providing modularity and adaptability.
Quick Start & Requirements
A pre-built Docker image provides an immediate web-based demo at http://localhost:8080. For local development, Python (3.9+) and Poetry (2.1+) are required. Installation involves cloning the repository, setting up a virtual environment, and using make install and make up_dev, or poetry install -E demo followed by running the web service with uvicorn. Linux or macOS is recommended; Windows users should utilize WSL due to the KenLM dependency. A minimum of 4 CPU cores and 8 GB RAM is advised for smooth demo performance.
Highlighted Details
Maintenance & Community
The provided README does not detail specific contributors, community channels (e.g., Discord, Slack), sponsorships, or a public roadmap.
Licensing & Compatibility
The project is released under the Apache 2.0 License, which permits commercial use and integration into closed-source applications.
Limitations & Caveats
The KenLM dependency lacks official Windows support, necessitating the use of WSL or containerized environments for Windows users, which can complicate setup and introduce potential dependency issues.
3 months ago
Inactive
kensho-technologies
espnet