M3D by BAAI-DCAI

Multi-modal LLM for 3D medical image analysis

Created 1 year ago

416 stars

Top 70.4% on SourcePulse

Project Summary

M3D is a comprehensive framework for 3D medical image analysis using multi-modal large language models. It offers a large-scale dataset (M3D-Data), versatile pre-trained models (M3D-LaMed), and an extensive evaluation benchmark (M3D-Bench) covering tasks like retrieval, report generation, VQA, and segmentation. This project targets researchers and developers in medical AI, providing tools to advance diagnostic and analytical capabilities.

How It Works

M3D-LaMed models integrate a pre-trained vision encoder (M3D-CLIP) with large language models (Phi-3-4B, Llama-2-7B). The architecture processes 3D medical images, normalizing and reshaping them into a format compatible with the vision encoder. This encoded visual information is then fused with text prompts, enabling the LLM to perform various downstream tasks. This multi-modal approach allows for a deeper understanding of medical images by leveraging the contextual and generative power of LLMs.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python, PyTorch, Hugging Face Transformers, SimpleITK, SimpleSliceViewer. Requires 3D medical images preprocessed into .npy format, normalized to 0-1, and shaped as 1x32x256x256. GPU with CUDA is recommended for performance.
Demo: An online demo is available.
Docs: Project code and models are available on Hugging Face and ModelScope.

Highlighted Details

M3D-Data: Largest-scale open-source 3D medical dataset with 120K image-text and 662K instruction-response pairs.
M3D-LaMed-Phi-3-4B: A recent, lightweight model outperforming larger models on some tasks.
M3D-Bench: Comprehensive benchmark covering 8 3D medical analysis tasks.
Supports image-text retrieval, report generation, VQA, and segmentation.

Maintenance & Community

The project is actively maintained, with recent updates including a new model release (M3D-LaMed-Phi-3-4B) and an online demo. Links to Hugging Face and ModelScope provide access to models and data.

Licensing & Compatibility

The project utilizes publicly available data from Radiopaedia, licensed for non-commercial use in machine learning. Citation is requested for use.

Limitations & Caveats

The segmentation task for the M3D-LaMed-Llama-2-7B model had known issues that are being addressed. While 2D images can theoretically be interpolated, the models are primarily trained on 3D data.

M3D by BAAI-DCAI

Explore Similar Projects

MMVP by tsb0601

SEED-Bench by AILab-CVC

SAT by zhaoziheng

Hulu-Med by ZJUI-AI4H

TITAN by mahmoodlab

HuatuoGPT-Vision by FreedomIntelligence

InstructCV by AlaaLab

Multi-Modality-Arena by OpenGVLab

RadFM by chaoyi-wu

XrayGPT by mbzuai-oryx

awesome-multimodal-in-medical-imaging by richard-peng-xia

Bagel by ByteDance-Seed