Discover and explore top open-source AI tools and projects—updated daily.
lanxiang1017Multimodal LLM for grounded ECG understanding
Top 100.0% on SourcePulse
GEM addresses the limitations of current Multimodal Large Language Models (MLLMs) in ECG interpretation, specifically insufficient synergy between ECG time series and images, and a lack of explainability. It targets researchers and engineers in medical AI and ECG analysis, offering a unified approach for grounded, clinician-aligned ECG interpretation with improved predictive performance, explainability, and evidence-based reasoning.
How It Works
GEM employs a dual-encoder framework to extract complementary features from ECG time series signals and 12-lead ECG images. Cross-modal alignment facilitates effective multimodal understanding. A key innovation is knowledge-guided instruction data generation, creating high-granularity grounding data (ECG-Grounding) that links diagnoses to specific, measurable waveform parameters. This approach enables feature-grounded analysis and evidence-driven reasoning, mimicking a clinician's diagnostic process.
Quick Start & Requirements
Installation involves cloning the repository and running a setup script:
git clone https://github.com/lanxiang1017/GEM.git
bash GEM/setup.sh
Significant data preparation is required, including downloading and organizing multiple ECG time series (MIMIC-IV, PTB-XL, etc.) and image datasets (ECG-Grounding-Images, PTB-XL-Test-Images, etc.) into a ./data directory structure. Pretrained components include an ECG encoder (ECG-CoCa) and MLLMs like PULSE or LLaVA. Training requires specifying data paths in GEM/scripts/train_gem.sh and executing it. Evaluation scripts are provided for ECG-Grounding and ECG-Bench benchmarks.
cpt_wfep_epoch_20.pt), and MLLMs (PULSE, LLaVA). Multi-GPU setup is recommended for faster interpretation generation during evaluation.Highlighted Details
Maintenance & Community
The project has seen recent updates, including NeurIPS 2025 acceptance and the release of the GEM-7B model and ECG-Grounding-30k data. No community channels (e.g., Discord, Slack) are explicitly mentioned in the README.
Licensing & Compatibility
The project's license is not specified in the provided README. This omission requires clarification for commercial use or integration into closed-source projects.
Limitations & Caveats
The setup involves substantial data acquisition and organization, and requires specific pretrained models. Evaluation scripts may necessitate multi-GPU configurations for efficiency. The absence of a stated license is a significant adoption blocker.
1 month ago
Inactive
StanfordBDHG