Benchmark for biomedicine text-mining tasks
Top 90.6% on sourcepulse
The BLUE benchmark provides a comprehensive evaluation suite for biomedical natural language processing (NLP) tasks, targeting researchers and developers in the field. It offers ten diverse corpora across five text-mining tasks, enabling standardized comparison of NLP models on real-world biomedical data.
How It Works
BLUE comprises ten datasets covering sentence similarity, named entity recognition (NER), relation extraction, document classification, and inference. It leverages existing, widely-used corpora from the BioNLP community, ensuring relevance and comparability. The benchmark facilitates evaluation using standard metrics like F1-score and Pearson correlation, with preprocessed data and BERT-formatted datasets available for ease of use.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine, and Clinical Center.
Licensing & Compatibility
The repository does not explicitly state a license. However, the project is a product of the National Institutes of Health (NIH), which generally makes its research outputs publicly available.
Limitations & Caveats
The benchmark's state-of-the-art (SOTA) performance metrics are from April 2019. Some datasets (MedSTS, ShARe/CLEFE) require obtaining copies from external websites. The disclaimer notes that the tool's output is for research purposes and not for direct clinical use.
3 years ago
Inactive