M3D  by BAAI-DCAI

Multi-modal LLM for 3D medical image analysis

Created 1 year ago
423 stars

Top 69.8% on SourcePulse

GitHubView on GitHub
Project Summary

M3D is a comprehensive framework for 3D medical image analysis using multi-modal large language models. It offers a large-scale dataset (M3D-Data), versatile pre-trained models (M3D-LaMed), and an extensive evaluation benchmark (M3D-Bench) covering tasks like retrieval, report generation, VQA, and segmentation. This project targets researchers and developers in medical AI, providing tools to advance diagnostic and analytical capabilities.

How It Works

M3D-LaMed models integrate a pre-trained vision encoder (M3D-CLIP) with large language models (Phi-3-4B, Llama-2-7B). The architecture processes 3D medical images, normalizing and reshaping them into a format compatible with the vision encoder. This encoded visual information is then fused with text prompts, enabling the LLM to perform various downstream tasks. This multi-modal approach allows for a deeper understanding of medical images by leveraging the contextual and generative power of LLMs.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python, PyTorch, Hugging Face Transformers, SimpleITK, SimpleSliceViewer. Requires 3D medical images preprocessed into .npy format, normalized to 0-1, and shaped as 1x32x256x256. GPU with CUDA is recommended for performance.
  • Demo: An online demo is available.
  • Docs: Project code and models are available on Hugging Face and ModelScope.

Highlighted Details

  • M3D-Data: Largest-scale open-source 3D medical dataset with 120K image-text and 662K instruction-response pairs.
  • M3D-LaMed-Phi-3-4B: A recent, lightweight model outperforming larger models on some tasks.
  • M3D-Bench: Comprehensive benchmark covering 8 3D medical analysis tasks.
  • Supports image-text retrieval, report generation, VQA, and segmentation.

Maintenance & Community

The project is actively maintained, with recent updates including a new model release (M3D-LaMed-Phi-3-4B) and an online demo. Links to Hugging Face and ModelScope provide access to models and data.

Licensing & Compatibility

The project utilizes publicly available data from Radiopaedia, licensed for non-commercial use in machine learning. Citation is requested for use.

Limitations & Caveats

The segmentation task for the M3D-LaMed-Llama-2-7B model had known issues that are being addressed. While 2D images can theoretically be interpolated, the models are primarily trained on 3D data.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
6 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.