GigaAM  by salute-developers

Foundational models for Russian speech processing

Created 1 year ago
293 stars

Top 90.1% on SourcePulse

GitHubView on GitHub
Project Summary

GigaAM is a family of open-source acoustic models for Russian speech processing, offering state-of-the-art performance in Automatic Speech Recognition (ASR) and emotion recognition. It provides foundational self-supervised models and fine-tuned variants for ASR (CTC and RNN-T) and emotion recognition, targeting researchers and developers working with Russian language audio data.

How It Works

GigaAM utilizes a Conformer architecture, a hybrid model combining self-attention and convolution, for its foundational models. These are pre-trained on extensive Russian speech datasets using self-supervised learning (wav2vec2-like for v1, HuBERT-like for v2). This approach allows for robust feature extraction, which is then fine-tuned for specific downstream tasks like ASR and emotion recognition, achieving superior results on Russian language data.

Quick Start & Requirements

  • Install via pip: pip install -e . after cloning the repository.
  • Requires Python ≥ 3.8 and ffmpeg installed and in PATH.
  • For long-form ASR, install with pip install gigaam[longform] and requires Hugging Face API token for pyannote.audio dependencies.
  • Official examples are available in inference_example.ipynb.

Highlighted Details

  • GigaAM-v2 models show significant WER reduction (-15% for CTC, -12% for RNN-T) compared to v1.
  • Achieves state-of-the-art results on Russian ASR benchmarks like Golos and OpenSTT.
  • GigaAM-Emo demonstrates high accuracy for emotion recognition on the Dusha dataset.
  • Supports ONNX export for efficient inference.

Maintenance & Community

  • The project is developed by salute-developers.
  • Links to related research papers and YouTube presentations are provided.

Licensing & Compatibility

  • Released under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

  • Primarily focused on the Russian language; performance on other languages is not specified.
  • Long-form ASR requires external dependencies and Hugging Face authentication.
Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
39 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.