Discover and explore top open-source AI tools and projects—updated daily.
Foundational models for Russian speech processing
Top 90.1% on SourcePulse
GigaAM is a family of open-source acoustic models for Russian speech processing, offering state-of-the-art performance in Automatic Speech Recognition (ASR) and emotion recognition. It provides foundational self-supervised models and fine-tuned variants for ASR (CTC and RNN-T) and emotion recognition, targeting researchers and developers working with Russian language audio data.
How It Works
GigaAM utilizes a Conformer architecture, a hybrid model combining self-attention and convolution, for its foundational models. These are pre-trained on extensive Russian speech datasets using self-supervised learning (wav2vec2-like for v1, HuBERT-like for v2). This approach allows for robust feature extraction, which is then fine-tuned for specific downstream tasks like ASR and emotion recognition, achieving superior results on Russian language data.
Quick Start & Requirements
pip install -e .
after cloning the repository.ffmpeg
installed and in PATH.pip install gigaam[longform]
and requires Hugging Face API token for pyannote.audio
dependencies.inference_example.ipynb
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 month ago
Inactive