Large-scale multimodal dataset for medicine research
Top 79.7% on sourcepulse
This repository provides MedTrinity-25M, a large-scale multimodal dataset for medical applications, featuring multigranular annotations. It is designed for researchers and developers working on medical vision-language models, offering a comprehensive resource for training and evaluating AI systems in healthcare.
How It Works
The dataset construction involves a two-stage process: data processing to extract essential information and generate coarse captions, followed by multigranular textual description generation using MLLMs to create fine-grained annotations. This approach aims to capture detailed medical context, enabling more sophisticated understanding and generation capabilities in medical AI.
Quick Start & Requirements
pip install -e .
. Additional packages for training are available via pip install -e ".[train]"
.flash-attn
and scaling_on_scales
are recommended for training.Highlighted Details
Maintenance & Community
The project is associated with UCSC-VLAA and has an arXiv paper released. Acknowledgements mention support from Microsoft, OpenAI, TPU Research Cloud, Google Cloud, AWS, and Lambda Cloud. It builds upon LLaVA-pp and LLaVA-Med codebases.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README does not specify a license, which may impact commercial adoption. The project is presented as a dataset and associated models, with training scripts indicating a need for substantial computational resources.
3 weeks ago
Inactive