Multilingual speech and text translation models for natural communication
Top 4.4% on sourcepulse
Seamless provides a suite of foundational AI models for advanced speech and text translation, targeting researchers and developers building multilingual communication applications. It offers capabilities for speech-to-speech, speech-to-text, text-to-speech, and text-to-text translation across approximately 100 languages, with specific models focusing on expressive prosody preservation and real-time streaming translation.
How It Works
The project leverages a novel UnitY2 architecture for its SeamlessM4T v2 models, enhancing translation quality and inference speed. SeamlessExpressive builds upon this by incorporating prosodic features like speech rate and pauses, while maintaining voice style and translation accuracy. SeamlessStreaming enables simultaneous translation and ASR through a streaming-optimized architecture, forming the basis of the unified Seamless model when combined with an expressive vocoder.
Quick Start & Requirements
pip install .
fairseq2
. ffmpeg
command-line tool required for Whisper (used for metrics).Highlighted Details
unity.cpp
enables GGML integration for on-device or C/C++ environments.Maintenance & Community
The project is developed by Meta AI. Key components include the fairseq2
library for sequence modeling, SONAR
for multilingual embeddings, BLASER 2.0
for multimodal translation evaluation, stopes
for dataset mining, and SimulEval
for streaming translation evaluation.
Licensing & Compatibility
Limitations & Caveats
The CC-BY-NC 4.0 license restricts commercial use of SeamlessM4T and SeamlessStreaming models. SeamlessExpressive requires a separate request and has its own license and acceptable use policy. Installation of fairseq2
is limited to specific platforms.
8 months ago
1 week