seamless_communication by facebookresearch

Multilingual speech and text translation models for natural communication

Created 2 years ago

11,754 stars

Top 4.3% on SourcePulse

View on GitHub

9 Experts Love This Project

Chaoyu Yang

Founder of Bento

Tim J. Baek

Founder of Open WebUI

Jiaming Song

Chief Scientist at Luma AI

Luis Capelo

Cofounder of Lightning AI

and 5 more!

Project Summary

Seamless provides a suite of foundational AI models for advanced speech and text translation, targeting researchers and developers building multilingual communication applications. It offers capabilities for speech-to-speech, speech-to-text, text-to-speech, and text-to-text translation across approximately 100 languages, with specific models focusing on expressive prosody preservation and real-time streaming translation.

How It Works

The project leverages a novel UnitY2 architecture for its SeamlessM4T v2 models, enhancing translation quality and inference speed. SeamlessExpressive builds upon this by incorporating prosodic features like speech rate and pauses, while maintaining voice style and translation accuracy. SeamlessStreaming enables simultaneous translation and ASR through a streaming-optimized architecture, forming the basis of the unified Seamless model when combined with an expressive vocoder.

Quick Start & Requirements

Installation: pip install .
Prerequisites: Linux x86-64 or Apple Silicon Mac for fairseq2. ffmpeg command-line tool required for Whisper (used for metrics).
Resources: SeamlessM4T-Large v2 has 2.3B parameters. SeamlessExpressive model artifacts require a request form.
Demos & Docs: SeamlessM4T v2 Demo, SeamlessExpressive Demo, SeamlessStreaming Demo, NeurIPS 2023 Tutorial.

Highlighted Details

SeamlessM4T v2 offers improved quality and reduced inference latency over v1.
SeamlessExpressive preserves voice style and prosody in speech-to-speech translation.
SeamlessStreaming supports real-time ASR and translation.
unity.cpp enables GGML integration for on-device or C/C++ environments.

Maintenance & Community

The project is developed by Meta AI. Key components include the fairseq2 library for sequence modeling, SONAR for multilingual embeddings, BLASER 2.0 for multimodal translation evaluation, stopes for dataset mining, and SimulEval for streaming translation evaluation.

Licensing & Compatibility

W2v-BERT 2.0 encoder, mExpresso text data, UnitY2 aligner, ETOX, and MuTox are MIT licensed.
SeamlessM4T and SeamlessStreaming models are CC-BY-NC 4.0 licensed (non-commercial use).
Seamless and SeamlessExpressive models are "Seamless licensed" (specific terms not detailed in README, likely non-commercial).

Limitations & Caveats

The CC-BY-NC 4.0 license restricts commercial use of SeamlessM4T and SeamlessStreaming models. SeamlessExpressive requires a separate request and has its own license and acceptable use policy. Installation of fairseq2 is limited to specific platforms.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

28 stars in the last 30 days