omnilingual-asr by facebookresearch

Multilingual speech recognition for over 1600 languages

Created 8 months ago

2,849 stars

Top 16.1% on SourcePulse

Project Summary

Omnilingual ASR is an open-source speech recognition system designed for broad accessibility, supporting over 1,600 languages, including hundreds previously uncovered by any ASR technology. It aims to make speech technology more inclusive and adaptable for communities and researchers worldwide by enabling new languages to be added with minimal data through scalable zero-shot learning.

How It Works

The system employs a flexible model family combining Wave2Vec (W2V), Connectionist Temporal Classification (CTC), and Large Language Model (LLM) architectures. Its core innovation lies in scalable zero-shot learning, allowing rapid adaptation to new languages using only a few paired examples, thereby circumventing the need for extensive, specialized datasets. This approach enhances inclusivity and adaptability for diverse linguistic communities.

Quick Start & Requirements

Installation: pip install omnilingual-asr or uv add omnilingual-asr.
Prerequisites: libsndfile is required for audio support (e.g., brew install libsndfile on macOS).
Links: Huggingface Demo, Huggingface Dataset (facebook/omnilingual-asr-corpus), Paper, Blogpost, Documentation, Quick Start, Inference Guide.

Highlighted Details

Supports over 1,600 languages, significantly expanding ASR coverage.
The 7B-LLM-ASR model achieves sub-10% character error rates (CER) for 78% of supported languages.
Offers a range of models (300M to 7B parameters) with varying VRAM and inference speed trade-offs.
Facilitates the addition of new languages with minimal paired examples.

Maintenance & Community

The project is attributed to the "Omnilingual ASR Team" with numerous listed authors. Specific community channels (e.g., Discord, Slack) or explicit roadmap links are not detailed in the provided README.

Licensing & Compatibility

The code and models are released under the Apache 2.0 license, which generally permits commercial use and integration into closed-source projects.

Limitations & Caveats

Currently, the inference pipeline only accepts audio files shorter than 40 seconds. Support for transcribing unlimited-length audio files is planned for a future release.

omnilingual-asr by facebookresearch

Explore Similar Projects

WenetSpeech-Yue by ASLP-lab

deepspeech-german by AASHISHAG

attention-is-all-you-need-paper by brandokoch

NanoLLM by dusty-nv

Dolphin by DataoceanAI

SONAR by facebookresearch

vits-simple-api by Artrajz

athena by athena-team

zero_nlp by yuanzhoulvpi2017

metavoice-src by metavoiceio

FunASR by modelscope

unilm by microsoft