Discover and explore top open-source AI tools and projects—updated daily.
salute-developersFoundational models for Russian speech processing
Top 85.7% on SourcePulse
GigaAM is a family of open-source acoustic models for Russian speech processing, offering state-of-the-art performance in Automatic Speech Recognition (ASR) and emotion recognition. It provides foundational self-supervised models and fine-tuned variants for ASR (CTC and RNN-T) and emotion recognition, targeting researchers and developers working with Russian language audio data.
How It Works
GigaAM utilizes a Conformer architecture, a hybrid model combining self-attention and convolution, for its foundational models. These are pre-trained on extensive Russian speech datasets using self-supervised learning (wav2vec2-like for v1, HuBERT-like for v2). This approach allows for robust feature extraction, which is then fine-tuned for specific downstream tasks like ASR and emotion recognition, achieving superior results on Russian language data.
Quick Start & Requirements
pip install -e . after cloning the repository.ffmpeg installed and in PATH.pip install gigaam[longform] and requires Hugging Face API token for pyannote.audio dependencies.inference_example.ipynb.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
3 months ago
Inactive