Open-source effort for NLLB checkpoints, aiming for commercial use
Top 67.1% on sourcepulse
This repository provides open-source checkpoints and training code for Meta's NLLB (No Language Left Behind) machine translation system, enabling commercial use of models supporting over 200 languages. It targets researchers and developers seeking high-quality, multilingual translation capabilities, particularly for low-resource languages, and aims to democratize AI by offering freely usable models.
How It Works
The project leverages Meta's NLLB architecture, which includes dense transformer models of varying sizes (600M to 3.3B parameters) and a Mixture-of-Experts (MoE) model (54.5B parameters). It utilizes a SentencePiece model (SPM-200) trained on 200+ languages for data encoding and provides comprehensive code for data mining, preparation, training, and inference. This approach allows for scalable, high-quality translation across a vast language spectrum.
Quick Start & Requirements
INSTALL
guide and the fairseq README.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The primary caveat is the conflicting licensing information regarding model usage: the project aims for commercial use, but the models are explicitly stated to be under a CC-BY-NC 4.0 license, which prohibits commercial applications.
1 year ago
1 day