MMed-RAG by richard-peng-xia

Enhancing medical vision-language models with multimodal RAG

Created 1 year ago

285 stars

Top 91.9% on SourcePulse

Project Summary

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

MMed-RAG presents a versatile multimodal Retrieval-Augmented Generation (RAG) system engineered to significantly enhance the factuality and reliability of Medical Vision-Language Models (Med-LVLMs). It is designed for researchers and practitioners in the medical AI domain who require more trustworthy AI-generated insights from complex medical imaging and textual data. By integrating a domain-aware retrieval mechanism, MMed-RAG aims to boost factual accuracy by up to 43.8%, addressing critical limitations in current Med-LVLM applications and making them more dependable for clinical decision support.

How It Works

The system's innovation lies in its domain-aware retrieval mechanism, which fosters better alignment across diverse medical specialties including radiology, pathology, and ophthalmology. MMed-RAG specifically tackles three pivotal alignment challenges inherent in multimodal RAG: 1) It discourages models from blindly copying external information, promoting instead the use of their own visual reasoning capabilities for complex problem-solving. 2) When models are uncertain, MMed-RAG guides them to intelligently retrieve and effectively utilize relevant knowledge, thereby boosting accuracy and reducing errors. 3) It actively prevents models from being misled by potentially incorrect retrieved information, thereby minimizing the generation of inaccurate medical diagnoses or reports. This multi-pronged approach ensures retrieved knowledge serves as a reliable supplement to the model's internal reasoning.

Quick Start & Requirements

Installation: Clone the repository, create a conda environment with Python 3.10 (conda create -n MMed-RAG python=3.10 -y), activate it, and install dependencies via pip install --upgrade pip followed by pip install -r requirements.txt and pip install trl.
Prerequisites: Requires the LLaVA-Med-1.5 model checkpoint. Access and download are necessary for medical datasets including MIMIC-CXR, IU-Xray, Harvard-FairVLMed, PMC-OA, and Quilt-1M.
Links: Paper, X(Twitter), Official GitHub Repo.

Highlighted Details

Achieves up to a 43.8% boost in factuality for Medical Vision-Language Models.
Addresses key multimodal RAG alignment challenges: self-reasoning over copying, intelligent knowledge retrieval when uncertain, and avoiding interference from incorrect retrievals.
Features a domain-aware retrieval mechanism tailored for medical imaging modalities (radiology, pathology, ophthalmology).

Maintenance & Community

The project was accepted to ICLR 2025, with training scripts and data released in late 2024. The manuscript is available on arXiv. MMed-RAG leverages code from LLaVA-Med, RULE, and CARES, indicating potential community integration and reliance.

Licensing & Compatibility

The provided README does not specify a software license. This omission requires clarification regarding usage rights, particularly for commercial applications or integration into closed-source systems.

Limitations & Caveats

Access to several required medical datasets (MIMIC-CXR, IU-Xray, Harvard-FairVLMed, PMC-OA, Quilt-1M) necessitates an application process and approval, potentially creating an adoption barrier. The system's performance is contingent on the quality of the base Med-LVLM and the retrieved information.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days