Neural word aligner for multilingual BERT models
Top 80.0% on sourcepulse
Awesome-align provides a neural approach to word alignment using multilingual BERT (mBERT). It targets NLP researchers and practitioners needing to extract word alignments from parallel corpora, offering improved quality over traditional statistical methods and enabling fine-tuning for specific language pairs.
How It Works
Awesome-align leverages mBERT's contextualized embeddings to infer word alignments. It extracts alignments using a 'softmax' method on the cross-lingual attention probabilities. The tool also supports fine-tuning mBERT on parallel data using various objectives like masked language modeling (MLM), translation language modeling (TLM), and self-training (SO) to enhance alignment quality.
Quick Start & Requirements
pip install -r requirements.txt
and python setup.py install
.|||
.Highlighted Details
Maintenance & Community
The project is associated with Neulab and its authors. Code is partially borrowed from HuggingFace Transformers (Apache 2.0).
Licensing & Compatibility
Code is licensed under Apache 2.0, allowing for commercial use and integration with closed-source projects.
Limitations & Caveats
The README does not specify hardware requirements beyond GPU recommendations, nor does it detail the expected setup time for fine-tuning. Performance claims are based on specific datasets and may vary.
3 years ago
1 day