Benchmark for cross-lingual generalization evaluation of multilingual models
Top 52.6% on sourcepulse
XTREME is a benchmark for evaluating the cross-lingual generalization capabilities of pre-trained multilingual language models. It targets researchers and practitioners in NLP who need to assess model performance across a wide range of languages and tasks, providing a standardized framework for comparing zero-shot cross-lingual transfer abilities.
How It Works
XTREME comprises nine diverse NLP tasks, including sentence classification, named entity recognition, and question answering, spanning 40 typologically diverse languages. The benchmark's core evaluation methodology is zero-shot cross-lingual transfer: models are fine-tuned on English data for each task and then evaluated on test data in other languages. This approach directly measures a model's ability to generalize learned representations across linguistic boundaries without task-specific multilingual fine-tuning.
Quick Start & Requirements
bash install_tools.sh
for dependencies.panx_dataset
to a download
folder, then run bash scripts/download_data.sh
.transformers
, seqeval
, tensorboardx
, jieba
, kytea
, pythainlp
, sacremoses
.Highlighted Details
Maintenance & Community
This project is from Google Research. Further community engagement details are not explicitly provided in the README.
Licensing & Compatibility
The repository itself is not explicitly licensed, but it references and utilizes datasets with their own licenses. Users should verify compatibility for commercial use or closed-source linking based on the individual dataset licenses.
Limitations & Caveats
The README notes that automatically translated test sets are "noisy and should not be treated as ground truth." The benchmark's focus is specifically on zero-shot transfer from English, which may not cover all desired cross-lingual evaluation scenarios.
2 years ago
Inactive