MMedAgent by Wangyixinxin

Medical AI agent for multimodal clinical tasks

Created 2 years ago

264 stars

Top 96.5% on SourcePulse

Project Summary

MMedAgent is a multimodal AI agent designed to address complex medical tasks by seamlessly integrating a wide spectrum of specialized tools. It targets researchers and practitioners in medical AI, offering a unified platform for handling diverse medical data modalities and tasks, thereby accelerating research and development in the field.

How It Works

This project builds upon LLaVA-PLUS and LLaVA-Med, forming a multimodal agent capable of processing various imaging modalities (MRI, CT, X-ray, Histology, Gross) alongside natural language. It achieves this by orchestrating a suite of integrated tools, including LLaVA-Med for VQA and classification, BiomedCLIP, Grounding DINO and MedSAM for image grounding and segmentation (with both bounding-box and text prompts), and ChatCAD for medical report generation and retrieval-augmented generation (RAG). This modular approach allows for flexible and comprehensive medical data analysis.

Quick Start & Requirements

Installation involves cloning the repository, creating a Python 3.10 Conda environment, activating it, and installing the package in editable mode (pip install -e .). Significant prerequisites include downloading base LLaMA model weights and applying provided delta weights to obtain LLaVA-Med checkpoints, as well as downloading specific tool checkpoints (e.g., GroundingDINO, MedSAM, ChatCAD). Training requires substantial GPU resources and CUDA. The setup for inference involves launching multiple tool worker services and a controller before starting the Gradio web server.

Highlighted Details

Supports a broad range of medical AI tasks: Visual Question Answering (VQA), Classification, Image Grounding, Segmentation (via bounding-box or text prompts), Medical Report Generation (MRG), and Retrieval Augmented Generation (RAG).
Integrates state-of-the-art models and tools tailored for medical applications, including LLaVA-Med, BiomedCLIP, Grounding DINO, MedSAM, and ChatCAD.
Handles diverse medical imaging modalities such as MRI, CT, X-ray, Histology, and Gross pathology images.
Provides the first open-source instruction-tuning dataset specifically designed for multimodal medical agents.

Maintenance & Community

The project indicates ongoing development with plans to extend tool lists and welcomes contributions. However, specific community channels (like Discord or Slack) or notable maintainer/contributor highlights are not detailed in the provided README.

Licensing & Compatibility

The README specifies that the usage of LLaVA-Med checkpoints must comply with the base LLM's model license, which is LLaMA. LLaMA's license typically imposes restrictions on commercial use. The license for the MMedAgent code itself is not explicitly stated, necessitating careful review for compatibility with closed-source or commercial applications.

Limitations & Caveats

The provided demo link is temporary and requires users to build their own web UI and server. The process of obtaining usable model weights involves downloading base LLaMA weights and applying delta weights, which can be complex. Training the model demands significant computational resources and a carefully configured environment.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days