MedTrinity-25M by UCSC-VLAA

Large-scale multimodal dataset for medicine research

Created 1 year ago

394 stars

Top 73.1% on SourcePulse

Project Summary

This repository provides MedTrinity-25M, a large-scale multimodal dataset for medical applications, featuring multigranular annotations. It is designed for researchers and developers working on medical vision-language models, offering a comprehensive resource for training and evaluating AI systems in healthcare.

How It Works

The dataset construction involves a two-stage process: data processing to extract essential information and generate coarse captions, followed by multigranular textual description generation using MLLMs to create fine-grained annotations. This approach aims to capture detailed medical context, enabling more sophisticated understanding and generation capabilities in medical AI.

Quick Start & Requirements

Installation: Clone the repository and install using pip install -e .. Additional packages for training are available via pip install -e ".[train]".
Prerequisites: Python 3.10, Linux environment. flash-attn and scaling_on_scales are recommended for training.
Resources: Training scripts are provided, suggesting significant computational resources are needed for model training.
Links: Dataset download: Huggingface Hub. Tutorial: Huggingface.

Highlighted Details

Offers pre-trained models like LLaVA-Med++ fine-tuned on specific medical benchmarks (VQA-RAD, SLAKE, PathVQA).
Includes scripts for pre-training, fine-tuning, and evaluation of LLaVA-Med++ models.
Provides a "Model-Zoo" with links to Hugging Face and Google Drive for various models.
Dataset construction pipeline detailed, including multigranular annotation generation.

Maintenance & Community

The project is associated with UCSC-VLAA and has an arXiv paper released. Acknowledgements mention support from Microsoft, OpenAI, TPU Research Cloud, Google Cloud, AWS, and Lambda Cloud. It builds upon LLaVA-pp and LLaVA-Med codebases.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify a license, which may impact commercial adoption. The project is presented as a dataset and associated models, with training scripts indicating a need for substantial computational resources.

MedTrinity-25M by UCSC-VLAA

Explore Similar Projects

Open-Qwen2VL by Victorwz

honeybee by khanrc

RLAIF-V by RLHF-V

VLM2Vec by TIGER-AI-Lab

mistral by stanford-crfm

MedQA-ChatGLM by WangRongsheng

BiomedGPT by taokz

molmo by allenai

TinyLLaVA_Factory by TinyLLaVA

AutoDL by DeepWisdom

FlagAI by FlagAI-Open

awesome-multimodal-ml by pliang279