HealthGPT by DCDmllm

Medical vision-language model for comprehension/generation, per research paper

Created 10 months ago

1,579 stars

Top 26.3% on SourcePulse

Project Summary

HealthGPT is a medical Large Vision-Language Model (LVLM) designed for unified medical visual comprehension and generation. It targets researchers and practitioners in the medical AI domain, offering a single framework to process and generate medical data based on both text and image inputs.

How It Works

HealthGPT employs a heterogeneous low-rank adaptation (H-LoRA) technique and a three-stage learning strategy to adapt pre-trained LLMs for visual tasks. Its architecture features hierarchical visual perception and a task-specific hard router to select relevant visual features and H-LoRA plugins, enabling autoregressive text and image generation.

Quick Start & Requirements

Installation: Clone the repository, create a Conda environment (conda create -n HealthGPT python=3.10), activate it (conda activate HealthGPT), and install dependencies (pip install -r requirements.txt).
Prerequisites: Requires downloading pre-trained weights for the visual encoder (clip-vit-large-patch14-336), base LLMs (Phi-3-mini-4k-instruct or Phi-4), and potentially VQGAN weights for generation. H-LoRA and adapter weights are also needed.
Inference: Run inference via provided shell scripts (com_infer.sh, gen_infer.sh) or Python commands, specifying paths to downloaded weights and model configurations. A Gradio UI is available via python app.py.
Resources: Specific hardware requirements are not detailed, but LLVMs typically demand significant GPU memory.
Links: HuggingFace (for weights), VL-Health Dataset, Paper.

Highlighted Details

Supports 7 medical comprehension and 5 medical generation tasks.
Achieved a score of 70.4 with HealthGPT-XL32 (based on Qwen2.5-32B-Instruct), outperforming previous versions.
Built upon LLaVA, LLaVA++, and Taming Transformers.

Maintenance & Community

The project is associated with multiple academic institutions and Alibaba. The README indicates ongoing development with planned releases for training scripts and a project website.

Licensing & Compatibility

Licensed under Apache License 2.0, permitting commercial use and closed-source linking.

Limitations & Caveats

Full training scripts and complete H-LoRA weights for generation tasks are not yet released. The project is based on recent models (e.g., Qwen2.5-32B-Instruct), which may have their own specific hardware demands.

HealthGPT by DCDmllm

Explore Similar Projects

BLIVA by mlpc-ucsd

ru-dolph by ai-forever

Lumina-mGPT by Alpha-VLLM

HuatuoGPT-Vision by FreedomIntelligence

visual-med-alpaca by cambridgeltl

Seed1.5-VL by ByteDance-Seed

VisionLLM by OpenGVLab

awesome-multimodal-in-medical-imaging by richard-peng-xia

Emu3 by baaivision

Multimodal-GPT by open-mmlab

smollm by huggingface

DeepSeek-VL2 by deepseek-ai