Discover and explore top open-source AI tools and projects—updated daily.
ZJUI-AI4HTransparent medical vision-language model
New!
Top 82.9% on SourcePulse
Hulu-Med is a transparent, generalist vision-language model designed for holistic medical understanding. It unifies the processing of diverse modalities including text, 2D images, 3D volumes, and videos, targeting researchers and practitioners in the medical AI domain. The project offers state-of-the-art performance on numerous medical benchmarks, trained entirely on publicly available data, promoting accessibility and reproducibility.
How It Works
The architecture features a SigLIP-based vision encoder (with 2D RoPE), a Qwen-based LLM decoder, and a multimodal projector. A key innovation is "Medical-Aware Token Reduction," achieving approximately 55% token reduction for efficient processing across diverse medical data modalities, enabling seamless integration and understanding.
Quick Start & Requirements
pip (PyTorch for CUDA 11.8, Flash-attn, Transformers, multimedia libs). HuggingFace Transformers integration is recommended for inference.flash-attn, transformers, decord, ffmpeg-python, imageio, opencv-python, nibabel.Highlighted Details
Maintenance & Community
Developed by the ZJU AI4H Team. No specific community channels (e.g., Discord, Slack) are detailed in the README.
Licensing & Compatibility
Released under the Apache 2.0 License, which is permissive and generally suitable for commercial use and integration into closed-source projects.
Limitations & Caveats
The README indicates that detailed instructions for downloading and preparing the training data are "Coming soon," potentially hindering full reproducibility or custom training efforts.
1 day ago
Inactive