GEM by lanxiang1017

Multimodal LLM for grounded ECG understanding

Created 11 months ago

258 stars

Top 98.1% on SourcePulse

Project Summary

GEM addresses the limitations of current Multimodal Large Language Models (MLLMs) in ECG interpretation, specifically insufficient synergy between ECG time series and images, and a lack of explainability. It targets researchers and engineers in medical AI and ECG analysis, offering a unified approach for grounded, clinician-aligned ECG interpretation with improved predictive performance, explainability, and evidence-based reasoning.

How It Works

GEM employs a dual-encoder framework to extract complementary features from ECG time series signals and 12-lead ECG images. Cross-modal alignment facilitates effective multimodal understanding. A key innovation is knowledge-guided instruction data generation, creating high-granularity grounding data (ECG-Grounding) that links diagnoses to specific, measurable waveform parameters. This approach enables feature-grounded analysis and evidence-driven reasoning, mimicking a clinician's diagnostic process.

Quick Start & Requirements

Installation involves cloning the repository and running a setup script:

git clone https://github.com/lanxiang1017/GEM.git
bash GEM/setup.sh

Significant data preparation is required, including downloading and organizing multiple ECG time series (MIMIC-IV, PTB-XL, etc.) and image datasets (ECG-Grounding-Images, PTB-XL-Test-Images, etc.) into a ./data directory structure. Pretrained components include an ECG encoder (ECG-CoCa) and MLLMs like PULSE or LLaVA. Training requires specifying data paths in GEM/scripts/train_gem.sh and executing it. Evaluation scripts are provided for ECG-Grounding and ECG-Bench benchmarks.

Prerequisites: Python, specific datasets, pretrained ECG encoder (cpt_wfep_epoch_20.pt), and MLLMs (PULSE, LLaVA). Multi-GPU setup is recommended for faster interpretation generation during evaluation.
Links: Project Page, Paper, Model, Data

Highlighted Details

Accepted to NeurIPS 2025.
Reports significant improvements: CSN +7.4%, explainability +22.7%, grounding +25.3%.
Introduces the "Grounded ECG Understanding" task and benchmark.
Released GEM-7B model and ECG-Grounding-30k dataset.

Maintenance & Community

The project has seen recent updates, including NeurIPS 2025 acceptance and the release of the GEM-7B model and ECG-Grounding-30k data. No community channels (e.g., Discord, Slack) are explicitly mentioned in the README.

Licensing & Compatibility

The project's license is not specified in the provided README. This omission requires clarification for commercial use or integration into closed-source projects.

Limitations & Caveats

The setup involves substantial data acquisition and organization, and requires specific pretrained models. Evaluation scripts may necessitate multi-GPU configurations for efficiency. The absence of a stated license is a significant adoption blocker.

Health Check

Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days