Medical LLM training pipeline using ChatGPT techniques
Top 12.4% on sourcepulse
MedicalGPT provides a comprehensive pipeline for training domain-specific large language models, focusing on the medical field. It enables users to replicate ChatGPT-like training methodologies, including pre-training, supervised fine-tuning (SFT), and preference optimization techniques like RLHF, DPO, ORPO, and GRPO. This project is valuable for researchers and developers aiming to build specialized medical AI assistants or enhance existing LLMs with medical knowledge and conversational capabilities.
How It Works
The project implements a multi-stage training process inspired by the ChatGPT pipeline. It starts with optional incremental pre-training (PT) on large domain-specific datasets to adapt the model to the medical domain. This is followed by supervised fine-tuning (SFT) using instruction-following datasets to align the model with user intents and inject medical knowledge. For further alignment with human preferences, it supports Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), ORPO, and GRPO, which refine the model's behavior without requiring complex RL setups.
Quick Start & Requirements
pip install -r requirements.txt --upgrade
CUDA_VISIBLE_DEVICES=0 python gradio_demo.py --base_model path_to_llama_hf_dir
Highlighted Details
Maintenance & Community
The project is actively maintained, with frequent updates adding support for new models and training methods. Community engagement is encouraged via GitHub issues.
Licensing & Compatibility
The code is licensed under the Apache License 2.0, permitting commercial use. However, model weights and data are restricted to research purposes only. A disclaimer is provided, and attribution to MedicalGPT is required for product descriptions.
Limitations & Caveats
While the project supports numerous models and training methods, the setup and execution can be resource-intensive, requiring significant VRAM and computational power, especially for full parameter training. The README notes that the code is "still rough" and encourages community contributions for improvements and testing.
3 weeks ago
1 day