Multilingual/multimodal embeddings for text and speech tasks
Top 45.2% on sourcepulse
SONAR provides a fixed-size, multilingual, and multimodal sentence embedding space, outperforming existing methods on cross-lingual similarity tasks. It supports text and speech encoders/decoders for tasks like translation and similarity search, benefiting researchers and developers working with diverse languages and modalities.
How It Works
SONAR leverages a teacher-student training approach on speech transcription data to embed speech segments into the same space as text. This allows for cross-modal and zero-shot language translation. The architecture includes separate encoders and decoders for text and speech, enabling flexible pipeline construction for various NLP and speech processing tasks.
Quick Start & Requirements
pip install sonar-space
fairseq2
with specific PyTorch and CUDA versions (e.g., pip install fairseq2 --extra-index-url https://fair.pkg.atmeta.com/fairseq2/whl/pt2.6.0/cu124
).$TORCH_HOME/hub
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
fairseq2
requires careful version matching for PyTorch and CUDA.4 days ago
1+ week