BianQue  by scutcyr

Chinese medical dialogue model for proactive health applications

created 2 years ago
846 stars

Top 43.1% on sourcepulse

GitHubView on GitHub
Project Summary

BianQue is an open-source Chinese medical dialogue large language model developed by South China University of Technology. It aims to improve the "questioning" ability of AI in healthcare, mimicking how doctors gather information through multi-turn conversations, and also provides health suggestions. The project targets researchers and developers in the active health domain, particularly for chronic diseases and mental health.

How It Works

BianQue is built upon the ProactiveHealthGPT foundation model and fine-tuned on the "BianQue Health Big Data" corpus, a dataset of millions of Chinese health dialogues. This corpus emphasizes multi-turn questioning (Chain of Questioning - CoQ) to address the limitation of single-turn interactions in existing medical LLMs. BianQue-2.0, specifically, uses ChatGLM-6B as a base and is further fine-tuned with instruction data including drug inserts, medical encyclopedias, and ChatGPT distillation, enhancing its suggestion and knowledge retrieval capabilities.

Quick Start & Requirements

  • Installation: Clone the repository and create a Conda environment using proactivehealthgpt_py38.yml. Install dependencies with pip install cpm_kernels and specific PyTorch versions (e.g., torch==1.13.1+cu116).
  • Prerequisites: CUDA >= 11.6 is recommended for GPU acceleration. Python 3.8 is specified in the environment file. Windows users need to configure CUDA-11.6 and cuDNN manually.
  • Demo: A Streamlit demo application is available via streamlit run bianque_v2_app.py --server.port 9005.
  • Documentation: Usage examples for single-turn and multi-turn conversations are provided in Python code snippets.

Highlighted Details

  • BianQue-1.0 was fine-tuned on over 9 million samples from a mixed dataset of Chinese medical dialogues and multi-turn doctor inquiries, taking ~16 days on 8x NVIDIA RTX 4090 GPUs for one epoch.
  • BianQue-2.0 expands training data to include drug instructions, medical encyclopedias, and ChatGPT distillation, strengthening suggestion and knowledge query abilities.
  • The project also released the "SoulChat" model for mental health, fine-tuned on long-text psychological consultation data and multi-turn empathetic dialogues.
  • A demo healthcare Q&A system based on BianQue-1.0 is available on Hugging Face Spaces.

Maintenance & Community

The project is initiated by the Future Technology School and Guangdong Key Laboratory of Digital Human at South China University of Technology. Contributions are encouraged via GitHub issues and PRs. Collaboration with academic institutions, hospitals, and companies is welcomed. Contact: eeyirongchen@mail.scut.edu.cn.

Licensing & Compatibility

BianQue-2.0 uses weights from ChatGLM-6B and is restricted to non-commercial research purposes due to its MODEL_LICENSE. Users must adhere to terms prohibiting commercial, military, or illegal use, and acknowledge that the model output is not a substitute for professional medical advice.

Limitations & Caveats

BianQue-1.0, trained for one epoch, may produce errors due to noise in the training data and exhibits an "inquisitive" style that might be idiosyncratic. Its "observation," "listening," and "pulse-taking" (望闻问切) capabilities require further research. The project disclaims responsibility for model output suitability and user reliance on its advice.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
34 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.