medmcqa by medmcqa

Medical MCQA dataset for advanced reasoning

Created 3 years ago

254 stars

Top 99.0% on SourcePulse

Project Summary

MedMCQA: Large-Scale Medical MCQA Dataset

MedMCQA is a substantial dataset designed to address multiple-choice question answering (MCQA) within the medical domain, specifically targeting real-world medical entrance exam questions. It serves NLP researchers and developers aiming to build advanced QA systems capable of deeper reasoning across diverse medical subjects. The dataset facilitates the development of models that can understand and answer complex medical queries, thereby advancing the field of medical AI.

How It Works

This project provides a curated dataset comprising over 194,000 high-quality MCQs sourced from AIIMS and NEET PG medical entrance examinations. It covers 2.4k healthcare topics and 21 distinct medical subjects, offering high topical diversity. Each data instance includes a question, multiple-choice options, the correct answer, and an expert's explanation, designed to test complex reasoning abilities beyond simple recall. The dataset is structured with splits based on actual exams to promote robust model generalization and evaluation.

Quick Start & Requirements

Install dependencies: pip3 install -r requirements.txt
Download data: https://drive.google.com/uc?export=download&id=15VkJdq5eyWIkfb_aoD3oS8i4tScbHYky
Run experiments: Clone the repository, install dependencies, download and unzip the data, then execute python3 train.py --model bert-base-uncased --dataset_folder_name "/content/medmcqa_data/".
Links: Repository (https://github.com/medmcqa/medmcqa), Paper (https://arxiv.org/abs/2203.14371), Homepage (https://medmcqa.github.io).

Highlighted Details

Features 194k+ high-quality MCQs from real medical entrance exams (AIIMS & NEET PG).
Covers 2.4k healthcare topics and 21 distinct medical subjects.
Designed to test 10+ distinct reasoning abilities.
Dataset split by exams (AIIMS PG, NEET PG) to promote model generalization.

Maintenance & Community

Point of contact: medmcqa [at] gmail.com.
No community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The license type and terms for commercial use or closed-source linking are not specified in the provided README content.

Limitations & Caveats

Test set evaluation requires submitting predictions via a Google Form, as ground truth is withheld to preserve integrity.
The absence of explicit licensing information poses a potential adoption blocker for commercial or sensitive projects.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

2 stars in the last 30 days

Explore Similar Projects

PULSE by openmedlab

Chinese medical LLM for diverse NLP tasks (health education, exam questions, report interpretation)

Created 2 years ago

Updated 2 years ago

MedReason by UCSC-VLAA

Medical reasoning dataset and models for LLMs

Created 11 months ago

Updated 8 months ago

WiNGPT2 by winninghealth

Medical LLM for intelligent Q&A, diagnosis support, and medical knowledge access

Created 2 years ago

Updated 1 year ago

Huatuo-26M by FreedomIntelligence

Large Chinese medical QA dataset with 26M question-answer pairs

Created 2 years ago

Updated 1 year ago

Sunsimiao by X-D-Lab

Chinese medical LLM for safe, reliable healthcare access

Created 2 years ago

Updated 1 year ago

MedAgents by gersteinlab

Research paper on LLMs as collaborators for medical reasoning

Created 2 years ago

Updated 1 year ago

Starred by

Yiran Wu

Yiran Wu(Coauthor of AutoGen).

large-qa-datasets by ad-freiburg

Collection of question answering datasets for NLP tasks

Created 6 years ago

Updated 1 year ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

GAOKAO-Bench by OpenLMLab

Evaluation framework for assessing LLMs using Chinese GAOKAO (college entrance exam) questions

Created 2 years ago

Updated 1 year ago

MING by MediaBrain-SJTU

Chinese medical LLM for medical consultation

Created 2 years ago

Updated 9 months ago

Starred by

Malte Pietsch

Malte Pietsch(Cofounder of deepset) and

Bojan Tunguz

Bojan Tunguz(AI Scientist; Formerly at NVIDIA).

DocProduct by re-search

Medical Q\&A with deep language models

Created 6 years ago

Updated 2 years ago

Med-ChatGLM by SCIR-HI

ChatGLM fine-tune for Chinese medical QA

Created 2 years ago

Updated 2 years ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI) and

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

ChatDoctor by Kent0n-Li

Medical chat model fine-tuned on LLaMA for medical domain Q&A

Created 2 years ago

Updated 1 year ago

Feedback? Help us improve.