seamless_communication  by facebookresearch

Multilingual speech and text translation models for natural communication

created 2 years ago
11,613 stars

Top 4.4% on sourcepulse

GitHubView on GitHub
Project Summary

Seamless provides a suite of foundational AI models for advanced speech and text translation, targeting researchers and developers building multilingual communication applications. It offers capabilities for speech-to-speech, speech-to-text, text-to-speech, and text-to-text translation across approximately 100 languages, with specific models focusing on expressive prosody preservation and real-time streaming translation.

How It Works

The project leverages a novel UnitY2 architecture for its SeamlessM4T v2 models, enhancing translation quality and inference speed. SeamlessExpressive builds upon this by incorporating prosodic features like speech rate and pauses, while maintaining voice style and translation accuracy. SeamlessStreaming enables simultaneous translation and ASR through a streaming-optimized architecture, forming the basis of the unified Seamless model when combined with an expressive vocoder.

Quick Start & Requirements

Highlighted Details

  • SeamlessM4T v2 offers improved quality and reduced inference latency over v1.
  • SeamlessExpressive preserves voice style and prosody in speech-to-speech translation.
  • SeamlessStreaming supports real-time ASR and translation.
  • unity.cpp enables GGML integration for on-device or C/C++ environments.

Maintenance & Community

The project is developed by Meta AI. Key components include the fairseq2 library for sequence modeling, SONAR for multilingual embeddings, BLASER 2.0 for multimodal translation evaluation, stopes for dataset mining, and SimulEval for streaming translation evaluation.

Licensing & Compatibility

  • W2v-BERT 2.0 encoder, mExpresso text data, UnitY2 aligner, ETOX, and MuTox are MIT licensed.
  • SeamlessM4T and SeamlessStreaming models are CC-BY-NC 4.0 licensed (non-commercial use).
  • Seamless and SeamlessExpressive models are "Seamless licensed" (specific terms not detailed in README, likely non-commercial).

Limitations & Caveats

The CC-BY-NC 4.0 license restricts commercial use of SeamlessM4T and SeamlessStreaming models. SeamlessExpressive requires a separate request and has its own license and acceptable use policy. Installation of fairseq2 is limited to specific platforms.

Health Check
Last commit

8 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
2
Star History
162 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

GPT-SoVITS by RVC-Boss

0.6%
49k
Few-shot voice cloning and TTS web UI
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.