Medical vision-language model for comprehension/generation, per research paper
Top 28.3% on sourcepulse
HealthGPT is a medical Large Vision-Language Model (LVLM) designed for unified medical visual comprehension and generation. It targets researchers and practitioners in the medical AI domain, offering a single framework to process and generate medical data based on both text and image inputs.
How It Works
HealthGPT employs a heterogeneous low-rank adaptation (H-LoRA) technique and a three-stage learning strategy to adapt pre-trained LLMs for visual tasks. Its architecture features hierarchical visual perception and a task-specific hard router to select relevant visual features and H-LoRA plugins, enabling autoregressive text and image generation.
Quick Start & Requirements
conda create -n HealthGPT python=3.10
), activate it (conda activate HealthGPT
), and install dependencies (pip install -r requirements.txt
).com_infer.sh
, gen_infer.sh
) or Python commands, specifying paths to downloaded weights and model configurations. A Gradio UI is available via python app.py
.Highlighted Details
Maintenance & Community
The project is associated with multiple academic institutions and Alibaba. The README indicates ongoing development with planned releases for training scripts and a project website.
Licensing & Compatibility
Licensed under Apache License 2.0, permitting commercial use and closed-source linking.
Limitations & Caveats
Full training scripts and complete H-LoRA weights for generation tasks are not yet released. The project is based on recent models (e.g., Qwen2.5-32B-Instruct), which may have their own specific hardware demands.
2 months ago
1 week