Med-BERT  by ZhiGroup

Contextualized embedding model for structured EHR data

created 5 years ago
294 stars

Top 90.9% on sourcepulse

GitHubView on GitHub
Project Summary

Med-BERT provides pre-trained contextualized embeddings for structured Electronic Health Records (EHR) data, specifically diagnosis codes in ICD-9 and ICD-10 formats. It aims to improve disease prediction performance for researchers and practitioners working with large-scale EHR datasets.

How It Works

Med-BERT adapts the BERT framework to process structured EHR data, pre-training embeddings on a massive dataset of over 28 million patients. This approach leverages the power of transformers to capture complex relationships and context within patient diagnosis histories, offering a significant performance boost over existing models for disease prediction tasks.

Quick Start & Requirements

  • Pretraining: Requires Python 3.7+, PyTorch 1.5.0, TensorFlow 1.13.1+, Pandas, Pickle, tqdm, and pytorch-transformers. Pretraining involves several Python scripts for data preprocessing, feature creation, and running the training loop.
  • Fine-tuning: Uses a similar data preparation script (create_ehr_pretrain_FTdata.py) and can be followed via a provided DHF prediction notebook.
  • Hardware: Primarily tested on GPU; CPU/TPU options may exist but are untested.

Highlighted Details

  • Achieves meaningful performance boosts on real-world disease prediction problems.
  • Pre-trained on a large-scale EHR dataset (28,490,650 patients).
  • Supports ICD-9 and ICD-10 diagnosis code formats.

Maintenance & Community

  • Contact via GitHub issues for questions.
  • Citation provided for the associated paper.

Licensing & Compatibility

  • The README does not explicitly state a license.
  • Pre-trained models are not sharable due to data vendor contract restrictions.

Limitations & Caveats

The pre-trained models are not available for download due to contractual limitations with data vendors. The code was primarily tested on GPU, with CPU and TPU support being untested.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.