BLUE_Benchmark  by ncbi-nlp

Benchmark for biomedicine text-mining tasks

created 6 years ago
296 stars

Top 90.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

The BLUE benchmark provides a comprehensive evaluation suite for biomedical natural language processing (NLP) tasks, targeting researchers and developers in the field. It offers ten diverse corpora across five text-mining tasks, enabling standardized comparison of NLP models on real-world biomedical data.

How It Works

BLUE comprises ten datasets covering sentence similarity, named entity recognition (NER), relation extraction, document classification, and inference. It leverages existing, widely-used corpora from the BioNLP community, ensuring relevance and comparability. The benchmark facilitates evaluation using standard metrics like F1-score and Pearson correlation, with preprocessed data and BERT-formatted datasets available for ease of use.

Quick Start & Requirements

Highlighted Details

  • Includes ten corpora across five biomedical NLP tasks: sentence similarity, NER, relation extraction, document classification, and inference.
  • Provides datasets in both raw and BERT-ready formats.
  • Offers baseline performance results using ELMo and NCBI_BERT models.
  • Updated DDI metric from micro-F1 to macro-F1 in August 2019.

Maintenance & Community

The project is supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine, and Clinical Center.

Licensing & Compatibility

The repository does not explicitly state a license. However, the project is a product of the National Institutes of Health (NIH), which generally makes its research outputs publicly available.

Limitations & Caveats

The benchmark's state-of-the-art (SOTA) performance metrics are from April 2019. Some datasets (MedSTS, ShARe/CLEFE) require obtaining copies from external websites. The disclaimer notes that the tool's output is for research purposes and not for direct clinical use.

Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.