speech-recognition-uk by egorsmkv

Resource collection for Ukrainian speech AI

Created 6 years ago

438 stars

Top 67.4% on SourcePulse

Project Summary

This repository serves as a comprehensive hub for Ukrainian language speech processing, offering a curated collection of models, datasets, and tools for both Speech-to-Text (STT) and Text-to-Speech (TTS). It targets researchers, developers, and power users working with Ukrainian audio data, providing readily accessible resources and benchmarks to accelerate development and evaluation.

How It Works

The project aggregates links to numerous pre-trained models from platforms like Hugging Face, covering various architectures such as wav2vec2, HuBERT, Citrinet, Conformer, and Whisper. It also highlights custom fine-tuned versions and quantized variants for improved efficiency. The repository emphasizes practical application by providing links to interactive demos and detailed benchmarks (WER, CER, Accuracy) on the Common Voice 10 test split, allowing users to compare model performance directly.

Quick Start & Requirements

Models are primarily accessed via Hugging Face model IDs (e.g., Yehor/w2v-bert-uk-v2.1).
Many models require PyTorch and Hugging Face transformers library.
Specific models may have additional dependencies (e.g., NVIDIA NeMo for Citrinet/ContextNet).
GPU acceleration is highly recommended for efficient inference and training.
Links to demos and specific model repositories are provided for immediate testing.

Highlighted Details

Extensive collection of STT models, including fine-tuned Whisper variants and NVIDIA's Citrinet/ContextNet.
Comprehensive benchmarks comparing various STT architectures on Ukrainian data.
Curated list of Ukrainian TTS models (StyleTTS2, RAD-TTS, Coqui TTS, FastPitch) with associated datasets.
Links to useful related tools like Ukrainian language models, IPA converters, and forced aligners.

Maintenance & Community

Active community support via Discord and Telegram channels.
Regular updates and additions of new models and datasets are implied by the breadth of content.
Links to related projects and initiatives like "Speech-UK" are provided.

Licensing & Compatibility

Licenses vary by individual model and dataset; users must verify each resource.
Many models are released under permissive licenses (e.g., MIT, Apache 2.0), facilitating commercial use.
Some datasets may have specific usage terms.

Limitations & Caveats

This repository is primarily a curated list of links, not a unified framework; integration may require custom scripting.
Performance can vary significantly between models and datasets, requiring careful selection based on specific use cases.
Some older models (e.g., DeepSpeech) may have lower performance compared to state-of-the-art architectures.

Health Check

Last Commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

2 stars in the last 30 days

Explore Similar Projects

praises by ElmTran

Text-to-speech tool for easy reading

Created 2 years ago

Updated 2 months ago

ASR-TTS-paper-daily by halsay

Daily AI paper updates for ASR and TTS research

Created 1 year ago

Updated 1 month ago

awesome-russian-speech by alphacep

Curated list of Russian speech tech resources

Created 3 years ago

Updated 3 months ago

Meta-voicebox by SpeechifyInc

PyTorch implementation of Meta's Voicebox speech model

Created 3 years ago

Updated 3 years ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

echogarden by echogarden-project

Cross-platform speech toolset for command-line or Node.js use

Created 3 years ago

Updated 2 months ago

edgedict by theblackcat102

RNN-Transducer for online speech recognition

Created 6 years ago

Updated 5 years ago

LLaSA_training by zhenye234

Speech synthesis research paper using LLaMA

Created 1 year ago

Updated 5 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

LLaSM by LinkSoul-AI

Open-source speech-language assistant for multimodal conversation

Created 2 years ago

Updated 2 years ago

Starred by

Dan Guido

Dan Guido(Cofounder of Trail of Bits),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

2 more.

ichigo by janhq

Speech package for local, real-time voice AI development

Created 2 years ago

Updated 7 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Didier Lopes

Didier Lopes(Founder of OpenBB).

Zonos by Zyphra

Open-weight text-to-speech model for expressive, high-quality speech generation

Created 1 year ago

Updated 1 year ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Tim J. Baek

Tim J. Baek(Founder of Open WebUI), and

7 more.

seamless_communication by facebookresearch

Multilingual speech and text translation models for natural communication

Created 2 years ago

Updated 3 months ago

sherpa-onnx by k2-fsa

Speech toolkit for local, offline speech AI tasks via ONNX

Created 3 years ago

Updated 1 day ago

Feedback? Help us improve.