Discover and explore top open-source AI tools and projects—updated daily.
medmcqaMedical MCQA dataset for advanced reasoning
Top 99.6% on SourcePulse
MedMCQA: Large-Scale Medical MCQA Dataset
MedMCQA is a substantial dataset designed to address multiple-choice question answering (MCQA) within the medical domain, specifically targeting real-world medical entrance exam questions. It serves NLP researchers and developers aiming to build advanced QA systems capable of deeper reasoning across diverse medical subjects. The dataset facilitates the development of models that can understand and answer complex medical queries, thereby advancing the field of medical AI.
How It Works
This project provides a curated dataset comprising over 194,000 high-quality MCQs sourced from AIIMS and NEET PG medical entrance examinations. It covers 2.4k healthcare topics and 21 distinct medical subjects, offering high topical diversity. Each data instance includes a question, multiple-choice options, the correct answer, and an expert's explanation, designed to test complex reasoning abilities beyond simple recall. The dataset is structured with splits based on actual exams to promote robust model generalization and evaluation.
Quick Start & Requirements
pip3 install -r requirements.txthttps://drive.google.com/uc?export=download&id=15VkJdq5eyWIkfb_aoD3oS8i4tScbHYkypython3 train.py --model bert-base-uncased --dataset_folder_name "/content/medmcqa_data/".https://github.com/medmcqa/medmcqa), Paper (https://arxiv.org/abs/2203.14371), Homepage (https://medmcqa.github.io).Highlighted Details
Maintenance & Community
medmcqa [at] gmail.com.Licensing & Compatibility
Limitations & Caveats
3 years ago
Inactive