SONAR  by facebookresearch

Multilingual/multimodal embeddings for text and speech tasks

created 2 years ago
792 stars

Top 45.2% on sourcepulse

GitHubView on GitHub
Project Summary

SONAR provides a fixed-size, multilingual, and multimodal sentence embedding space, outperforming existing methods on cross-lingual similarity tasks. It supports text and speech encoders/decoders for tasks like translation and similarity search, benefiting researchers and developers working with diverse languages and modalities.

How It Works

SONAR leverages a teacher-student training approach on speech transcription data to embed speech segments into the same space as text. This allows for cross-modal and zero-shot language translation. The architecture includes separate encoders and decoders for text and speech, enabling flexible pipeline construction for various NLP and speech processing tasks.

Quick Start & Requirements

  • Install via pip: pip install sonar-space
  • Requires fairseq2 with specific PyTorch and CUDA versions (e.g., pip install fairseq2 --extra-index-url https://fair.pkg.atmeta.com/fairseq2/whl/pt2.6.0/cu124).
  • Models are automatically downloaded to $TORCH_HOME/hub.
  • GPU acceleration is recommended for performance.
  • Demo notebooks are available for detailed examples.

Highlighted Details

  • Supports 200 languages for text and 37 for speech.
  • Enables text-to-text, speech-to-text, and speech-to-embedding tasks.
  • Includes BLASER 2.0 models for MT quality evaluation and MuTox for multilingual toxicity classification.
  • Embeddings are 1024-dimensional.

Maintenance & Community

  • Developed by Facebook Research.
  • Contribution guidelines are provided.
  • Citation information for the associated paper is available.

Licensing & Compatibility

  • SONAR code is MIT licensed.
  • Caution: Some SONAR models are released under a non-commercial license (NC_MODEL_LICENSE). Refer to LICENSE for details.

Limitations & Caveats

  • The dependency on fairseq2 requires careful version matching for PyTorch and CUDA.
  • Some models have non-commercial restrictions, limiting their use in commercial applications.
Health Check
Last commit

4 days ago

Responsiveness

1+ week

Pull Requests (30d)
3
Issues (30d)
0
Star History
63 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Lianmin Zheng Lianmin Zheng(Author of SGLang).

fish-speech by fishaudio

0.3%
23k
Open-source TTS for multilingual speech synthesis
created 1 year ago
updated 1 week ago
Feedback? Help us improve.