BLUE_Benchmark by ncbi-nlp

Benchmark for biomedicine text-mining tasks

Created 6 years ago

296 stars

Top 89.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

The BLUE benchmark provides a comprehensive evaluation suite for biomedical natural language processing (NLP) tasks, targeting researchers and developers in the field. It offers ten diverse corpora across five text-mining tasks, enabling standardized comparison of NLP models on real-world biomedical data.

How It Works

BLUE comprises ten datasets covering sentence similarity, named entity recognition (NER), relation extraction, document classification, and inference. It leverages existing, widely-used corpora from the BioNLP community, ensuring relevance and comparability. The benchmark facilitates evaluation using standard metrics like F1-score and Pearson correlation, with preprocessed data and BERT-formatted datasets available for ease of use.

Quick Start & Requirements

Datasets can be downloaded from the releases page: https://github.com/ncbi-nlp/BLUE_Benchmark/releases/tag/0.1
Fine-tuning with BERT requires the NCBI BlueBERT code: https://github.com/ncbi-nlp/ncbi_bluebert
Fine-tuning with ELMo involves concatenating ELMo embeddings and using a maximum sequence length of 128, with a learning rate of 0.001, Adam optimizer, batch size 64, and 20 epochs.

Highlighted Details

Includes ten corpora across five biomedical NLP tasks: sentence similarity, NER, relation extraction, document classification, and inference.
Provides datasets in both raw and BERT-ready formats.
Offers baseline performance results using ELMo and NCBI_BERT models.
Updated DDI metric from micro-F1 to macro-F1 in August 2019.

Maintenance & Community

The project is supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine, and Clinical Center.

Licensing & Compatibility

The repository does not explicitly state a license. However, the project is a product of the National Institutes of Health (NIH), which generally makes its research outputs publicly available.

Limitations & Caveats

The benchmark's state-of-the-art (SOTA) performance metrics are from April 2019. Some datasets (MedSTS, ShARe/CLEFE) require obtaining copies from external websites. The disclaimer notes that the tool's output is for research purposes and not for direct clinical use.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days