QiZhenGPT by CMKRG

Chinese medical LLM for medical Q&A, powered by a Chinese medical knowledge graph

Created 2 years ago

768 stars

Top 45.5% on SourcePulse

Project Summary

QiZhenGPT is an open-source Chinese medical large language model and AI assistant suite designed to enhance medical knowledge Q&A, clinical decision support, and administrative efficiency for healthcare professionals and patients. It leverages a proprietary Chinese medical knowledge base to fine-tune existing LLMs, aiming for higher accuracy in medical contexts than general-purpose models.

How It Works

QiZhenGPT is built upon a "data + knowledge dual-wheel drive" technical route. It fine-tunes base models like Chinese-LLaMA-Plus-7B, CaMA-13B, and ChatGLM-6B using a custom-built instruction dataset derived from real patient-doctor Q&A and structured medical knowledge. This approach aims to mitigate "data hallucination" common in models trained solely on synthetic data, leading to more factually accurate medical responses.

Quick Start & Requirements

Installation: pip install -r requirements.txt
Prerequisites: Requires base models (e.g., Chinese-LLaMA-Plus-7B, CaMA-13B, ChatGLM-6B) and downloaded LoRA weights. Specific instructions for merging weights and running demos are provided for each base model.
Resources: Training checkpoints indicate usage of 6-7 A800 (80G) GPUs.
Links: Model Download Table, Quick Start Instructions

Highlighted Details

Performance: Benchmarks show QiZhen-CaMA-13B-Checkpoint-12400 achieving 91.49% accuracy on drug indications (Standard 1) and 95% on clinical manifestations, significantly outperforming ChatGPT and ChatGLM in medical Q&A.
Dataset: Utilizes 560K real patient-doctor Q&A and 180K drug-related instruction data, plus 298K disease-related instructions, aiming for factual accuracy.
MedCopilot: A practical application suite integrated with HIS and EMR systems, offering features like functional list assistants, diagnostic support, quality monitoring, and automated medical record generation.
Real-world Deployment: MedCopilot is actively used at the Second Affiliated Hospital of Zhejiang University.

Maintenance & Community

The project has seen regular updates, with the latest in August 2024. Specific community channels are not explicitly listed in the README.

Licensing & Compatibility

License: MIT License (as per LICENSE file).
Restrictions: Explicitly states resources are for academic research only and strictly prohibited for commercial use.

Limitations & Caveats

The project is strictly for academic research and prohibits commercial use. Some earlier checkpoints (e.g., QiZhen-Chinese-LLaMA-7B) exhibited "repetition phenomena" which can be mitigated by adjusting repetition_penalty. ChatGLM-6B fine-tuning was found to be less effective for factual medical Q&A due to severe hallucination issues.

QiZhenGPT by CMKRG

Explore Similar Projects

PULSE by openmedlab

WiNGPT2 by winninghealth

Awesome-Medical-Healthcare-Dataset-For-LLM by onejune2018

Huatuo-26M by FreedomIntelligence

Sunsimiao by X-D-Lab

Zhongjing by SupritYoung

medAlpaca by kbressem

ChatMed by michael-wzhu

Med-ChatGLM by SCIR-HI

HuatuoGPT by FreedomIntelligence

meditron by epfLLM

Doctor-Dignity by llSourcell