MedTrinity-25M  by UCSC-VLAA

Large-scale multimodal dataset for medicine research

created 1 year ago
355 stars

Top 79.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides MedTrinity-25M, a large-scale multimodal dataset for medical applications, featuring multigranular annotations. It is designed for researchers and developers working on medical vision-language models, offering a comprehensive resource for training and evaluating AI systems in healthcare.

How It Works

The dataset construction involves a two-stage process: data processing to extract essential information and generate coarse captions, followed by multigranular textual description generation using MLLMs to create fine-grained annotations. This approach aims to capture detailed medical context, enabling more sophisticated understanding and generation capabilities in medical AI.

Quick Start & Requirements

  • Installation: Clone the repository and install using pip install -e .. Additional packages for training are available via pip install -e ".[train]".
  • Prerequisites: Python 3.10, Linux environment. flash-attn and scaling_on_scales are recommended for training.
  • Resources: Training scripts are provided, suggesting significant computational resources are needed for model training.
  • Links: Dataset download: Huggingface Hub. Tutorial: Huggingface.

Highlighted Details

  • Offers pre-trained models like LLaVA-Med++ fine-tuned on specific medical benchmarks (VQA-RAD, SLAKE, PathVQA).
  • Includes scripts for pre-training, fine-tuning, and evaluation of LLaVA-Med++ models.
  • Provides a "Model-Zoo" with links to Hugging Face and Google Drive for various models.
  • Dataset construction pipeline detailed, including multigranular annotation generation.

Maintenance & Community

The project is associated with UCSC-VLAA and has an arXiv paper released. Acknowledgements mention support from Microsoft, OpenAI, TPU Research Cloud, Google Cloud, AWS, and Lambda Cloud. It builds upon LLaVA-pp and LLaVA-Med codebases.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify a license, which may impact commercial adoption. The project is presented as a dataset and associated models, with training scripts indicating a need for substantial computational resources.

Health Check
Last commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
49 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.