AIDoctor by Jerry-XDL

Medical GPT model training with ChatGPT pipeline

Created 10 months ago

275 stars

Top 94.1% on SourcePulse

Project Summary

AIDoctor provides a comprehensive pipeline for training a medical-focused GPT model, leveraging the LLaMA architecture. It targets researchers and developers aiming to build specialized medical AI assistants by implementing a full ChatGPT-like training process, including pre-training, supervised fine-tuning, and reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO).

How It Works

AIDoctor follows a four-stage training methodology: 1) Continue Pretraining LLaMA on medical domain data to imbue specialized knowledge. 2) Supervised Fine-tuning (SFT) on medical Q&A datasets to align with instruction-following. 3) Reward Modeling (RM) to train a model that predicts human preferences ("helpful, honest, harmless") using ranked medical dialogue data. 4) Reinforcement Learning (RL) to optimize the SFT model using the RM, maximizing preferred outputs. This staged approach aims to systematically enhance the model's medical accuracy and alignment with user needs.

Quick Start & Requirements

Install/Run: python scripts/gradio_demo.py for demo, sh scripts/run_pt.sh, sh run_sft.sh, sh run_rm.sh, sh run_rl.sh for training stages.
Prerequisites: LLaMA model weights (HF format), medical datasets (e.g., shibing624/medical, FreedomIntelligence/HuatuoGPT-sft-data-v1), Python, PyTorch. GPU acceleration is highly recommended for training and inference.
Resources: Training requires significant computational resources (GPU memory, time) and large datasets.
Links: Hugging Face Demo, Datasets

Highlighted Details

Implements a full ChatGPT training pipeline (PT, SFT, RM, RLHF/DPO).
Utilizes a 2.4 million Chinese medical dataset for training.
Supports LLaMA models and LoRA fine-tuning.
Includes a Gradio-based demo for interactive inference.

Maintenance & Community

The project is actively developed by Jerry-XDL. Contributions are welcomed via Pull Requests with unit tests. Contact is available via GitHub Issues or email.

Licensing & Compatibility

The project code is licensed under Apache License 2.0, permitting commercial use. However, the model weights and data are restricted to research purposes only.

Limitations & Caveats

The SFT model may generate factually incorrect answers, struggle with harmful instructions, and has limitations in reasoning, coding, and multi-round dialogues. The model weights and data are strictly for research use and not for commercial applications or any purpose that could cause societal harm.

AIDoctor by Jerry-XDL

Explore Similar Projects

UltraFeedback by OpenBMB

Zhongjing by SupritYoung

RRHF by GanjinZero

ReST-MCTS by THUDM

personal_chatgpt by chunhuizhang

Dromedary by IBM

MOSS-RLHF by OpenLMLab

train-deepseek-r1 by FareedKhan-dev

following-instructions-human-feedback by openai

lm-human-preferences by openai

llm-datasets by mlabonne

baby-llama2-chinese by DLLXW