SONAR by facebookresearch

Multilingual/multimodal embeddings for text and speech tasks

Created 2 years ago

871 stars

Top 41.0% on SourcePulse

Project Summary

SONAR provides a fixed-size, multilingual, and multimodal sentence embedding space, outperforming existing methods on cross-lingual similarity tasks. It supports text and speech encoders/decoders for tasks like translation and similarity search, benefiting researchers and developers working with diverse languages and modalities.

How It Works

SONAR leverages a teacher-student training approach on speech transcription data to embed speech segments into the same space as text. This allows for cross-modal and zero-shot language translation. The architecture includes separate encoders and decoders for text and speech, enabling flexible pipeline construction for various NLP and speech processing tasks.

Quick Start & Requirements

Install via pip: pip install sonar-space
Requires fairseq2 with specific PyTorch and CUDA versions (e.g., pip install fairseq2 --extra-index-url https://fair.pkg.atmeta.com/fairseq2/whl/pt2.6.0/cu124).
Models are automatically downloaded to $TORCH_HOME/hub.
GPU acceleration is recommended for performance.
Demo notebooks are available for detailed examples.

Highlighted Details

Supports 200 languages for text and 37 for speech.
Enables text-to-text, speech-to-text, and speech-to-embedding tasks.
Includes BLASER 2.0 models for MT quality evaluation and MuTox for multilingual toxicity classification.
Embeddings are 1024-dimensional.

Maintenance & Community

Developed by Facebook Research.
Contribution guidelines are provided.
Citation information for the associated paper is available.

Licensing & Compatibility

SONAR code is MIT licensed.
Caution: Some SONAR models are released under a non-commercial license (NC_MODEL_LICENSE). Refer to LICENSE for details.

Limitations & Caveats

The dependency on fairseq2 requires careful version matching for PyTorch and CUDA.
Some models have non-commercial restrictions, limiting their use in commercial applications.

Health Check

Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)

1

Issues (30d)

0

Star History

13 stars in the last 30 days

Explore Similar Projects

speech-recognition-uk by egorsmkv

Resource collection for Ukrainian speech AI

Created 5 years ago

Updated 5 months ago

Lyra by JIA-Lab-research

Omni-cognition framework for speech, image, and video understanding/generation

Created 1 year ago

Updated 1 year ago

Meta-voicebox by SpeechifyInc

PyTorch implementation of Meta's Voicebox speech model

Created 2 years ago

Updated 2 years ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

LLaSM by LinkSoul-AI

Open-source speech-language assistant for multimodal conversation

Created 2 years ago

Updated 2 years ago

vits-simple-api by Artrajz

HTTP API for VITS-based text-to-speech and voice conversion

Created 3 years ago

Updated 4 months ago

pororo by kakaobrain

NLP SDK for natural language and speech processing tasks

Created 5 years ago

Updated 3 years ago

Starred by

Abubakar Abid

Abubakar Abid(Cofounder of Gradio).

voice-pro by abus-aikorea

WebUI for speech recognition, translation, and dubbing

Created 1 year ago

Updated 2 months ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

2 more.

VoiceCraft by jasonppy

Zero-shot speech editing and TTS research paper

Created 1 year ago

Updated 11 months ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Tim J. Baek

Tim J. Baek(Founder of Open WebUI), and

7 more.

seamless_communication by facebookresearch

Multilingual speech and text translation models for natural communication

Created 2 years ago

Updated 1 year ago

Spark-TTS by SparkAudio

PyTorch code for efficient LLM-based text-to-speech inference

Created 1 year ago

Updated 10 months ago

Bert-VITS2 by fishaudio

VITS2 backbone for multilingual text-to-speech

Created 2 years ago

Updated 2 days ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

2 more.

fish-speech by fishaudio

Open-source TTS for multilingual speech synthesis

Created 2 years ago

Updated 3 weeks ago

Feedback? Help us improve.