Awesome-Medical-Healthcare-Dataset-For-LLM  by onejune2018

Curated list of datasets, models, and papers for medical LLMs

created 1 year ago
259 stars

Top 98.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, curated list of datasets, models, and research papers relevant to Large Language Models (LLMs) in the medical and healthcare domain. It aims to be a central resource for researchers, developers, and practitioners working with AI in healthcare, providing pointers to valuable resources for building and evaluating medical LLMs.

How It Works

The repository categorizes resources into Datasets, Models, and Papers. Datasets are further broken down by language (Chinese and English) and type (dialogue, EHR, literature, etc.), with details on content, size, and access links. Models are listed with their base architecture, parameter count, key features, and availability. Papers are linked with their respective research contributions and often include code repositories.

Quick Start & Requirements

This repository is a curated list and does not require installation or direct execution. Users can browse the links provided for datasets, models, and papers to access them. Requirements will vary based on the specific resources accessed.

Highlighted Details

  • Extensive coverage of Chinese medical datasets, including dialogue, Q&A, and instruction-tuning data.
  • Comprehensive listing of medical LLMs, detailing their base models (e.g., LLaMA, ChatGLM, Qwen) and specific medical adaptations.
  • Includes a broad range of research papers and benchmarks relevant to LLMs in healthcare, covering areas from diagnosis to literature analysis.

Maintenance & Community

The repository is maintained by onejune2018 and lists several contributors. It provides a GitHub link for community engagement and code contributions.

Licensing & Compatibility

The repository itself is licensed under the MIT License. However, the underlying datasets and models listed may have their own specific licenses, including Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License for some resources. Users must verify the licenses of individual components before use, especially for commercial applications.

Limitations & Caveats

As a curated list, the quality and availability of linked resources are dependent on their original sources. Some links may become outdated, and the rapid evolution of the LLM field means new resources are constantly emerging. Users should independently verify the suitability and licensing of any dataset or model before integration.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
31 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.