MedReason by UCSC-VLAA

Medical reasoning dataset and models for LLMs

Created 1 year ago

279 stars

Top 92.9% on SourcePulse

Project Summary

MedReason addresses the challenge of enabling faithful and explainable medical reasoning in Large Language Models (LLMs). It provides a large-scale dataset of 32,682 medical QA pairs, augmented with step-by-step reasoning paths derived from a knowledge graph. This resource aims to improve LLM performance on complex medical problem-solving tasks, benefiting researchers and developers in the medical AI domain.

How It Works

The project leverages a structured medical knowledge graph to systematically convert clinical question-answer pairs into detailed, logical "thinking paths." This automated pipeline generates a high-quality dataset for supervised finetuning (SFT). By training LLMs on these explicit reasoning steps, MedReason enhances their ability to provide accurate and explainable medical insights, moving beyond simple answer generation.

Quick Start & Requirements

Inference: Pre-trained models like UCSC-VLAA/MedReason-8B can be loaded directly using Hugging Face transformers (example provided). Deployment is supported via vllm or Sglang.
Data Generation: Requires Python, Azure API keys, and configuration of dataset paths.
Training: Finetuning requires significant resources, specifically 8-GPU setups, utilizing accelerate and deepspeed (Zero3 configuration). Base models include HuatuoGPT-o1-8B and DeepSeek-R1-Distill-Llama-8B.
Evaluation: Uses Sglang for model deployment and evaluation scripts.
Dependencies: transformers, torch, accelerate, deepspeed, sglang.

Highlighted Details

The MedReason dataset comprises 32,682 QA pairs with detailed reasoning explanations.
The MedReason-8B model achieves state-of-the-art performance on medical reasoning benchmarks.
The dataset received 3rd prize in the Huggingface Reasoning Datasets Competition (May 2025).
An arXiv paper detailing the methodology and results is available.

Maintenance & Community

The project acknowledges contributions from foundational models and tools like HuatuoGPT, trl, and sglang. No direct community channels (Discord/Slack) or explicit roadmap are provided in the README.

Licensing & Compatibility

The README does not specify a software license. This lack of clarity poses a significant adoption risk, particularly for commercial use or integration into proprietary systems.

Limitations & Caveats

The README does not detail known limitations, bugs, or alpha status. Data generation requires Azure API keys. Training and evaluation necessitate substantial hardware resources (e.g., 8-GPU clusters).

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days