omnilingual-asr  by facebookresearch

Multilingual speech recognition for over 1600 languages

Created 2 months ago
2,570 stars

Top 18.1% on SourcePulse

GitHubView on GitHub
Project Summary

Omnilingual ASR is an open-source speech recognition system designed for broad accessibility, supporting over 1,600 languages, including hundreds previously uncovered by any ASR technology. It aims to make speech technology more inclusive and adaptable for communities and researchers worldwide by enabling new languages to be added with minimal data through scalable zero-shot learning.

How It Works

The system employs a flexible model family combining Wave2Vec (W2V), Connectionist Temporal Classification (CTC), and Large Language Model (LLM) architectures. Its core innovation lies in scalable zero-shot learning, allowing rapid adaptation to new languages using only a few paired examples, thereby circumventing the need for extensive, specialized datasets. This approach enhances inclusivity and adaptability for diverse linguistic communities.

Quick Start & Requirements

  • Installation: pip install omnilingual-asr or uv add omnilingual-asr.
  • Prerequisites: libsndfile is required for audio support (e.g., brew install libsndfile on macOS).
  • Links: Huggingface Demo, Huggingface Dataset (facebook/omnilingual-asr-corpus), Paper, Blogpost, Documentation, Quick Start, Inference Guide.

Highlighted Details

  • Supports over 1,600 languages, significantly expanding ASR coverage.
  • The 7B-LLM-ASR model achieves sub-10% character error rates (CER) for 78% of supported languages.
  • Offers a range of models (300M to 7B parameters) with varying VRAM and inference speed trade-offs.
  • Facilitates the addition of new languages with minimal paired examples.

Maintenance & Community

The project is attributed to the "Omnilingual ASR Team" with numerous listed authors. Specific community channels (e.g., Discord, Slack) or explicit roadmap links are not detailed in the provided README.

Licensing & Compatibility

The code and models are released under the Apache 2.0 license, which generally permits commercial use and integration into closed-source projects.

Limitations & Caveats

Currently, the inference pipeline only accepts audio files shorter than 40 seconds. Support for transcribing unlimited-length audio files is planned for a future release.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
14
Star History
136 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
2 more.

metavoice-src by metavoiceio

0.0%
4k
TTS model for human-like, expressive speech
Created 1 year ago
Updated 1 year ago
Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
43 more.

whisper by openai

0.3%
93k
Speech recognition model for multilingual transcription/translation
Created 3 years ago
Updated 3 weeks ago
Feedback? Help us improve.