QiZhenGPT  by CMKRG

Chinese medical LLM for medical Q&A, powered by a Chinese medical knowledge graph

created 2 years ago
740 stars

Top 47.8% on sourcepulse

GitHubView on GitHub
Project Summary

QiZhenGPT is an open-source Chinese medical large language model and AI assistant suite designed to enhance medical knowledge Q&A, clinical decision support, and administrative efficiency for healthcare professionals and patients. It leverages a proprietary Chinese medical knowledge base to fine-tune existing LLMs, aiming for higher accuracy in medical contexts than general-purpose models.

How It Works

QiZhenGPT is built upon a "data + knowledge dual-wheel drive" technical route. It fine-tunes base models like Chinese-LLaMA-Plus-7B, CaMA-13B, and ChatGLM-6B using a custom-built instruction dataset derived from real patient-doctor Q&A and structured medical knowledge. This approach aims to mitigate "data hallucination" common in models trained solely on synthetic data, leading to more factually accurate medical responses.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt
  • Prerequisites: Requires base models (e.g., Chinese-LLaMA-Plus-7B, CaMA-13B, ChatGLM-6B) and downloaded LoRA weights. Specific instructions for merging weights and running demos are provided for each base model.
  • Resources: Training checkpoints indicate usage of 6-7 A800 (80G) GPUs.
  • Links: Model Download Table, Quick Start Instructions

Highlighted Details

  • Performance: Benchmarks show QiZhen-CaMA-13B-Checkpoint-12400 achieving 91.49% accuracy on drug indications (Standard 1) and 95% on clinical manifestations, significantly outperforming ChatGPT and ChatGLM in medical Q&A.
  • Dataset: Utilizes 560K real patient-doctor Q&A and 180K drug-related instruction data, plus 298K disease-related instructions, aiming for factual accuracy.
  • MedCopilot: A practical application suite integrated with HIS and EMR systems, offering features like functional list assistants, diagnostic support, quality monitoring, and automated medical record generation.
  • Real-world Deployment: MedCopilot is actively used at the Second Affiliated Hospital of Zhejiang University.

Maintenance & Community

The project has seen regular updates, with the latest in August 2024. Specific community channels are not explicitly listed in the README.

Licensing & Compatibility

  • License: MIT License (as per LICENSE file).
  • Restrictions: Explicitly states resources are for academic research only and strictly prohibited for commercial use.

Limitations & Caveats

The project is strictly for academic research and prohibits commercial use. Some earlier checkpoints (e.g., QiZhen-Chinese-LLaMA-7B) exhibited "repetition phenomena" which can be mitigated by adjusting repetition_penalty. ChatGLM-6B fine-tuning was found to be less effective for factual medical Q&A due to severe hallucination issues.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
29 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.