clinicalBERT  by kexinhuang12345

Clinical notes model for hospital readmission prediction

created 6 years ago
424 stars

Top 70.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides pretraining and fine-tuning weights for ClinicalBERT, a contextual representation model specifically designed for clinical notes. It addresses the challenge of extracting meaningful insights from unstructured clinical text to predict hospital readmission, targeting researchers and practitioners in clinical NLP and healthcare analytics.

How It Works

ClinicalBERT leverages the BERT architecture, pre-trained on a large corpus of clinical notes. This approach allows it to capture domain-specific language nuances and contextual relationships within clinical text, leading to improved performance on downstream tasks like hospital readmission prediction compared to general-purpose language models.

Quick Start & Requirements

  • Install: pip install pytorch-pretrained-bert
  • Data: Requires MIMIC-III dataset with specific file structure (data/discharge, data/3days, data/2days) containing CSV files with "TEXT", "ID", and "Label" columns. CITI training program completion is required for MIMIC-III access.
  • Weights: Download pre-trained weights from a provided Google link.
  • Scripts: Python scripts are available for evaluation and training.

Highlighted Details

  • Pretrained weights for ClinicalBERT and fine-tuned models for hospital readmission prediction are available.
  • Scripts are provided for predicting 30-day hospital readmissions using early notes or discharge summaries.
  • Includes instructions and scripts for pretraining ClinicalBERT and Clinical XLNet from scratch.
  • Offers visualization notebooks for self-attention and downloadable Word2Vec/FastText models for clinical notes.

Maintenance & Community

  • Contact: kh2383@nyu.edu for assistance or to submit issues.
  • Citation: Provided arXiv link for the associated paper.

Licensing & Compatibility

  • License: Not explicitly stated in the README.
  • Compatibility: Designed for clinical notes; can be used with any clinical notes, but fine-tuning on specific datasets is recommended.

Limitations & Caveats

The README does not explicitly state the license, which could impact commercial use. Access to MIMIC-III data requires completing the CITI training program.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.