Biomedical LLM for multimodal GPT-4 level capabilities
Top 22.8% on sourcepulse
LLaVA-Med is a multimodal large language model designed for biomedical applications, aiming to achieve GPT-4 level capabilities in understanding and responding to visual and textual medical information. It is targeted at AI researchers and developers working in biomedical vision-language processing and visual question answering. The project offers a foundation for building advanced AI assistants capable of interpreting medical images and related text.
How It Works
LLaVA-Med builds upon the general-domain LLaVA model and employs a curriculum learning strategy for continuous training. This approach first aligns biomedical concepts and then performs full instruction-tuning. This staged training process is designed to efficiently adapt the model to the complexities of the biomedical domain, leading to improved performance on specialized tasks.
Quick Start & Requirements
pip install -e .
within a Python 3.10 conda environment.microsoft/llava-med-v1.5-mistral-7b
).Highlighted Details
Maintenance & Community
The project is developed by Microsoft. The latest version, v1.5, was released on May 13, 2024, with significant improvements and easier usability. The original codebase (v1.0.0) has been archived.
Licensing & Compatibility
The data, code, and model checkpoints are released under the MSR release policy and are intended for research use only. They are subject to the Terms of Use of LLaMA, Vicuna, and GPT-4. The data is licensed under CC BY NC 4.0. Commercial use and clinical decision-making are expressly prohibited.
Limitations & Caveats
This model is English-only and has been evaluated on a limited set of biomedical benchmark tasks, making it unsuitable for clinical settings. It may produce inaccurate predictions and inherits limitations from the base LLaVA model. Biases present in the PMC-15M dataset may also be reflected in the model's outputs.
1 month ago
Inactive