bluebert by ncbi-nlp

BERT model for biomedical NLP

Created 6 years ago

585 stars

Top 55.4% on SourcePulse

1 Expert Loves This Project

hammer

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

BlueBERT provides pre-trained BERT models specifically for biomedical natural language processing tasks, leveraging PubMed abstracts and MIMIC-III clinical notes. It offers researchers and developers specialized language representations for improved performance on tasks like named entity recognition, relation extraction, and sentence similarity within the biomedical domain.

How It Works

BlueBERT builds upon the BERT architecture, pre-training it on a large corpus of biomedical text. This includes PubMed abstracts and clinical notes from MIMIC-III, exposing the model to domain-specific terminology and linguistic patterns. This specialized pre-training allows BlueBERT to capture nuances of biomedical language more effectively than general-domain models.

Quick Start & Requirements

Installation: Models are available via Huggingface: https://huggingface.co/bionlp/bluebert_pubmed_uncased_L-12_H-768_A-12
Prerequisites: Python, TensorFlow (implied by .ckpt files and run_pretraining.py script). Specific versions not stated.
Resources: Pre-trained models are large. Fine-tuning requires significant computational resources (GPU recommended).
Documentation: Fine-tuning examples provided in the README for various tasks.

Highlighted Details

Offers four pre-trained model variants: Base/Large, uncased, trained on PubMed or PubMed+MIMIC-III.
Includes code for fine-tuning on Sentence Similarity (STS), Named Entity Recognition (NER), Relation Extraction, and Document Multilabel Classification.
Provides preprocessed PubMed texts and code for replicating the pre-training process.
Models are available on Huggingface for easier integration.

Maintenance & Community

Last updated: November 1st, 2020 (Huggingface availability).
Project was formerly known as NCBI_BERT.
No explicit community links (Discord, Slack) are provided in the README.

Licensing & Compatibility

License: Not explicitly stated in the README. The code appears to be derived from the original BERT repository, which was Apache 2.0. However, the data sources (PubMed, MIMIC-III) have their own usage terms.
Compatibility: Designed for use with TensorFlow.

Limitations & Caveats

The project's last update was in late 2020, suggesting potential staleness regarding newer NLP techniques or library versions.
Specific TensorFlow version requirements are not detailed.
The README does not explicitly state the license for the BlueBERT models themselves, only referencing the original BERT code.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

Macadam by yongzhuo

NLP tool for text classification, sequence labeling, and relation extraction

Created 5 years ago

Updated 2 years ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

BLUE_Benchmark by ncbi-nlp

Benchmark for biomedicine text-mining tasks

Created 6 years ago

Updated 4 years ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera) and

Thomas Wolf

Thomas Wolf(Cofounder of Hugging Face).

biobert-pretrained by naver

Pre-trained weights for biomedical text mining

Created 7 years ago

Updated 5 years ago

nlp-paper by changwookjun

Created 6 years ago

Updated 1 year ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

BERT-Relation-Extraction by plkmo

PyTorch scripts for relation extraction, based on BERT

Created 6 years ago

Updated 2 years ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

CBLUE by CBLUEbenchmark

Benchmark for Chinese biomedical language understanding

Created 4 years ago

Updated 2 years ago

text_similarity by adsieg

Resources for text similarity methods

Created 6 years ago

Updated 5 years ago

NLP-Projects by gaoisbest

NLP project collection with concepts and scripts

Created 8 years ago

Updated 5 years ago

Starred by

Malte Pietsch

Malte Pietsch(Cofounder of deepset) and

Bojan Tunguz

Bojan Tunguz(AI Scientist; Formerly at NVIDIA).

DocProduct by re-search

Medical Q\&A with deep language models

Created 6 years ago

Updated 2 years ago

bert_seq2seq by 920232796

PyTorch toolkit for sequence-to-sequence and other NLP tasks

Created 5 years ago

Updated 3 years ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

nlp-journey by msgi

NLP resource collection: papers, code, and articles

Created 6 years ago

Updated 2 days ago

bert-utils by terrifyzhao

BERT utility for sentence embeddings, text classification, and similarity

Created 7 years ago

Updated 6 years ago

Feedback? Help us improve.