HealthGPT  by DCDmllm

Medical vision-language model for comprehension/generation, per research paper

created 5 months ago
1,481 stars

Top 28.3% on sourcepulse

GitHubView on GitHub
Project Summary

HealthGPT is a medical Large Vision-Language Model (LVLM) designed for unified medical visual comprehension and generation. It targets researchers and practitioners in the medical AI domain, offering a single framework to process and generate medical data based on both text and image inputs.

How It Works

HealthGPT employs a heterogeneous low-rank adaptation (H-LoRA) technique and a three-stage learning strategy to adapt pre-trained LLMs for visual tasks. Its architecture features hierarchical visual perception and a task-specific hard router to select relevant visual features and H-LoRA plugins, enabling autoregressive text and image generation.

Quick Start & Requirements

  • Installation: Clone the repository, create a Conda environment (conda create -n HealthGPT python=3.10), activate it (conda activate HealthGPT), and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Requires downloading pre-trained weights for the visual encoder (clip-vit-large-patch14-336), base LLMs (Phi-3-mini-4k-instruct or Phi-4), and potentially VQGAN weights for generation. H-LoRA and adapter weights are also needed.
  • Inference: Run inference via provided shell scripts (com_infer.sh, gen_infer.sh) or Python commands, specifying paths to downloaded weights and model configurations. A Gradio UI is available via python app.py.
  • Resources: Specific hardware requirements are not detailed, but LLVMs typically demand significant GPU memory.
  • Links: HuggingFace (for weights), VL-Health Dataset, Paper.

Highlighted Details

  • Supports 7 medical comprehension and 5 medical generation tasks.
  • Achieved a score of 70.4 with HealthGPT-XL32 (based on Qwen2.5-32B-Instruct), outperforming previous versions.
  • Built upon LLaVA, LLaVA++, and Taming Transformers.

Maintenance & Community

The project is associated with multiple academic institutions and Alibaba. The README indicates ongoing development with planned releases for training scripts and a project website.

Licensing & Compatibility

Licensed under Apache License 2.0, permitting commercial use and closed-source linking.

Limitations & Caveats

Full training scripts and complete H-LoRA weights for generation tasks are not yet released. The project is based on recent models (e.g., Qwen2.5-32B-Instruct), which may have their own specific hardware demands.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
3
Star History
161 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.