GigaAM by salute-developers

Foundational models for Russian speech processing

Created 1 year ago

470 stars

Top 64.9% on SourcePulse

Project Summary

GigaAM is a family of open-source acoustic models for Russian speech processing, offering state-of-the-art performance in Automatic Speech Recognition (ASR) and emotion recognition. It provides foundational self-supervised models and fine-tuned variants for ASR (CTC and RNN-T) and emotion recognition, targeting researchers and developers working with Russian language audio data.

How It Works

GigaAM utilizes a Conformer architecture, a hybrid model combining self-attention and convolution, for its foundational models. These are pre-trained on extensive Russian speech datasets using self-supervised learning (wav2vec2-like for v1, HuBERT-like for v2). This approach allows for robust feature extraction, which is then fine-tuned for specific downstream tasks like ASR and emotion recognition, achieving superior results on Russian language data.

Quick Start & Requirements

Install via pip: pip install -e . after cloning the repository.
Requires Python ≥ 3.8 and ffmpeg installed and in PATH.
For long-form ASR, install with pip install gigaam[longform] and requires Hugging Face API token for pyannote.audio dependencies.
Official examples are available in inference_example.ipynb.

Highlighted Details

GigaAM-v2 models show significant WER reduction (-15% for CTC, -12% for RNN-T) compared to v1.
Achieves state-of-the-art results on Russian ASR benchmarks like Golos and OpenSTT.
GigaAM-Emo demonstrates high accuracy for emotion recognition on the Dusha dataset.
Supports ONNX export for efficient inference.

Maintenance & Community

The project is developed by salute-developers.
Links to related research papers and YouTube presentations are provided.

Licensing & Compatibility

Released under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

Primarily focused on the Russian language; performance on other languages is not specified.
Long-form ASR requires external dependencies and Hugging Face authentication.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

35 stars in the last 30 days