M3D  by BAAI-DCAI

Multi-modal LLM for 3D medical image analysis

created 1 year ago
355 stars

Top 79.7% on sourcepulse

GitHubView on GitHub
Project Summary

M3D is a comprehensive framework for 3D medical image analysis using multi-modal large language models. It offers a large-scale dataset (M3D-Data), versatile pre-trained models (M3D-LaMed), and an extensive evaluation benchmark (M3D-Bench) covering tasks like retrieval, report generation, VQA, and segmentation. This project targets researchers and developers in medical AI, providing tools to advance diagnostic and analytical capabilities.

How It Works

M3D-LaMed models integrate a pre-trained vision encoder (M3D-CLIP) with large language models (Phi-3-4B, Llama-2-7B). The architecture processes 3D medical images, normalizing and reshaping them into a format compatible with the vision encoder. This encoded visual information is then fused with text prompts, enabling the LLM to perform various downstream tasks. This multi-modal approach allows for a deeper understanding of medical images by leveraging the contextual and generative power of LLMs.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python, PyTorch, Hugging Face Transformers, SimpleITK, SimpleSliceViewer. Requires 3D medical images preprocessed into .npy format, normalized to 0-1, and shaped as 1x32x256x256. GPU with CUDA is recommended for performance.
  • Demo: An online demo is available.
  • Docs: Project code and models are available on Hugging Face and ModelScope.

Highlighted Details

  • M3D-Data: Largest-scale open-source 3D medical dataset with 120K image-text and 662K instruction-response pairs.
  • M3D-LaMed-Phi-3-4B: A recent, lightweight model outperforming larger models on some tasks.
  • M3D-Bench: Comprehensive benchmark covering 8 3D medical analysis tasks.
  • Supports image-text retrieval, report generation, VQA, and segmentation.

Maintenance & Community

The project is actively maintained, with recent updates including a new model release (M3D-LaMed-Phi-3-4B) and an online demo. Links to Hugging Face and ModelScope provide access to models and data.

Licensing & Compatibility

The project utilizes publicly available data from Radiopaedia, licensed for non-commercial use in machine learning. Citation is requested for use.

Limitations & Caveats

The segmentation task for the M3D-LaMed-Llama-2-7B model had known issues that are being addressed. While 2D images can theoretically be interpolated, the models are primarily trained on 3D data.

Health Check
Last commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
53 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.