Discover and explore top open-source AI tools and projects—updated daily.
facebookresearchMultilingual speech and text translation models for natural communication
Top 4.3% on SourcePulse
Seamless provides a suite of foundational AI models for advanced speech and text translation, targeting researchers and developers building multilingual communication applications. It offers capabilities for speech-to-speech, speech-to-text, text-to-speech, and text-to-text translation across approximately 100 languages, with specific models focusing on expressive prosody preservation and real-time streaming translation.
How It Works
The project leverages a novel UnitY2 architecture for its SeamlessM4T v2 models, enhancing translation quality and inference speed. SeamlessExpressive builds upon this by incorporating prosodic features like speech rate and pauses, while maintaining voice style and translation accuracy. SeamlessStreaming enables simultaneous translation and ASR through a streaming-optimized architecture, forming the basis of the unified Seamless model when combined with an expressive vocoder.
Quick Start & Requirements
pip install .fairseq2. ffmpeg command-line tool required for Whisper (used for metrics).Highlighted Details
unity.cpp enables GGML integration for on-device or C/C++ environments.Maintenance & Community
The project is developed by Meta AI. Key components include the fairseq2 library for sequence modeling, SONAR for multilingual embeddings, BLASER 2.0 for multimodal translation evaluation, stopes for dataset mining, and SimulEval for streaming translation evaluation.
Licensing & Compatibility
Limitations & Caveats
The CC-BY-NC 4.0 license restricts commercial use of SeamlessM4T and SeamlessStreaming models. SeamlessExpressive requires a separate request and has its own license and acceptable use policy. Installation of fairseq2 is limited to specific platforms.
11 months ago
Inactive
janhq
openai