LLaVA-Med by microsoft

Biomedical LLM for multimodal GPT-4 level capabilities

Created 2 years ago

2,113 stars

Top 21.0% on SourcePulse

Project Summary

LLaVA-Med is a multimodal large language model designed for biomedical applications, aiming to achieve GPT-4 level capabilities in understanding and responding to visual and textual medical information. It is targeted at AI researchers and developers working in biomedical vision-language processing and visual question answering. The project offers a foundation for building advanced AI assistants capable of interpreting medical images and related text.

How It Works

LLaVA-Med builds upon the general-domain LLaVA model and employs a curriculum learning strategy for continuous training. This approach first aligns biomedical concepts and then performs full instruction-tuning. This staged training process is designed to efficiently adapt the model to the complexities of the biomedical domain, leading to improved performance on specialized tasks.

Quick Start & Requirements

Install: Clone the repository and install dependencies using pip install -e . within a Python 3.10 conda environment.
Model Download: Load LLaVA-Med v1.5 directly from Hugging Face Hub (microsoft/llava-med-v1.5-mistral-7b).
Prerequisites: Requires Python 3.10, PyTorch, and potentially multiple GPUs for larger models or faster inference. Azure OpenAI API keys are needed for GPT-assisted evaluation.
Serving: Launch controller, model worker, and Gradio web server using provided Python commands.
Docs: Paper, Huggingface Hub

Highlighted Details

Achieves GPT-4 level capabilities in the biomedical domain.
Accepted as a spotlight presentation at NeurIPS 2023 Datasets and Benchmarks Track.
Trained on a large-scale biomedical multimodal instruction-following dataset.
Supports serving via a web UI and includes evaluation pipelines.

Maintenance & Community

The project is developed by Microsoft. The latest version, v1.5, was released on May 13, 2024, with significant improvements and easier usability. The original codebase (v1.0.0) has been archived.

Licensing & Compatibility

The data, code, and model checkpoints are released under the MSR release policy and are intended for research use only. They are subject to the Terms of Use of LLaMA, Vicuna, and GPT-4. The data is licensed under CC BY NC 4.0. Commercial use and clinical decision-making are expressly prohibited.

Limitations & Caveats

This model is English-only and has been evaluated on a limited set of biomedical benchmark tasks, making it unsuitable for clinical settings. It may produce inaccurate predictions and inherits limitations from the base LLaVA model. Biases present in the PMC-15M dataset may also be reflected in the model's outputs.

Health Check

Last Commit

7 months ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

23 stars in the last 30 days