Library for masked language model scoring (ACL 2020 paper)
Top 81.6% on sourcepulse
This Python library and accompanying examples enable scoring sentences and rescoring n-best lists using Masked Language Models (MLMs) like BERT and RoBERTa, as well as autoregressive models like GPT-2. It targets researchers and practitioners in speech recognition, machine translation, and linguistic acceptability, offering improved language model integration for these tasks.
How It Works
The library computes pseudo-log-likelihood (PLL) scores by masking individual words within sentences and leveraging the predictive capabilities of MLMs. It also supports direct log-probability scoring for autoregressive models. This approach allows for unsupervised ranking and rescoring of hypotheses, providing a flexible way to integrate powerful pre-trained language models into various NLP pipelines.
Quick Start & Requirements
pip install -e .
mxnet-cu102mkl
).Highlighted Details
Maintenance & Community
The project originates from AWS Labs. Further community engagement details (e.g., Discord, Slack, roadmap) are not explicitly mentioned in the README.
Licensing & Compatibility
The project is released under the Apache License 2.0, which permits commercial use and integration with closed-source projects.
Limitations & Caveats
The PyTorch interface is marked as experimental. The installation requires specific MXNet versions tied to CUDA versions, which may require careful environment management.
2 years ago
1 week