Huatuo-Llama-Med-Chinese by SCIR-HI

Instruction-tuned LLMs for Chinese medical knowledge

Created 2 years ago

4,927 stars

Top 10.1% on SourcePulse

1 Expert Loves This Project

JustinLin610

Core Maintainer at Alibaba Qwen

Project Summary

This repository provides instruction-tuned large language models (LLMs) specifically for the Chinese medical domain, named BenTsao (formerly HuaTuo). It aims to improve LLM performance in medical question answering by fine-tuning base models like LLaMA, Bloom, and Huozi with a custom Chinese medical instruction dataset derived from knowledge graphs and literature.

How It Works

The project employs LoRA (Low-Rank Adaptation) for efficient instruction fine-tuning, balancing computational resources and model performance. A key innovation is "knowledge-tuning," which involves a three-stage process: extracting parameters from a question to query a medical knowledge base, retrieving relevant knowledge, and then using this knowledge to generate an answer. This approach aims to make LLMs explicitly utilize structured medical knowledge during inference for more reliable responses.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Python 3.9+ recommended.
LoRA weights are available via Baidu Netdisk or Hugging Face.
Inference scripts are provided for different base models and data sources.
Example inference command: python infer.py --base_model 'BASE_MODEL_PATH' --lora_weights 'LORA_WEIGHTS_PATH' --use_lora True --instruct_dir 'INFER_DATA_PATH' --prompt_template 'TEMPLATE_PATH'

Highlighted Details

Offers fine-tuned models based on LLaMA, Bloom, Alpaca-Chinese, and the Huozi (活字) model.
Dataset construction involves using GPT-3.5 API with medical knowledge graphs (e.g., cMeKG) and medical literature (e.g., 2023 liver cancer literature).
Published research papers detailing the methodology and datasets.
LoRA fine-tuning on an A100-SXM-80GB GPU with batch size 128 uses ~40GB VRAM; 24GB VRAM GPUs (3090/4090) are expected to support it.

Maintenance & Community

Developed by the Health Intelligence Group at the SCIR Center, Harbin Institute of Technology.
Key contributors and supervising professors are listed.
References and acknowledges several open-source projects including Huozi, LLaMA, Stanford Alpaca, and CMeKG.

Licensing & Compatibility

The project explicitly states that all related resources are for academic research only and strictly prohibited for commercial use.
Use of third-party code is subject to their respective open-source licenses.

Limitations & Caveats

The project's dataset is largely model-generated and should not be used for actual medical diagnosis.
The accuracy of model-generated content is not guaranteed due to factors like randomness and quantization.
The README notes that LLaMA-based models may exhibit occasional errors or repetition due to limited Chinese corpora and a "rough" knowledge integration method; Huozi-based models are recommended for better performance.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

20 stars in the last 30 days

Explore Similar Projects

Starred by

Shawn Wang

Shawn Wang(Editor of Latent Space).

Awesome-instruction-tuning by zhilizju

Curated list of instruction tuning resources

Created 2 years ago

Updated 2 years ago

PULSE by openmedlab

Chinese medical LLM for diverse NLP tasks (health education, exam questions, report interpretation)

Created 2 years ago

Updated 2 years ago

Starred by

Luca Soldaini

Luca Soldaini(Research Scientist at Ai2) and

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

awesome-instruction-learning by RenzeLou

Curated list of instruction tuning/following papers and datasets

Created 2 years ago

Updated 1 year ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs),

Luca Soldaini

Luca Soldaini(Research Scientist at Ai2), and

1 more.

Instruction-Tuning-Papers by SinclairCoder

Reading list for instruction tuning papers

Created 3 years ago

Updated 2 years ago

ChatKBQA by LHRLAB

Research paper resources for knowledge base question answering

Created 2 years ago

Updated 3 months ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

ChatGLM-LLaMA-chinese-insturct by 27182812

Fine-tuning exploration for ChatGLM, LLaMA on Chinese instruction data

Created 2 years ago

Updated 2 years ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

PMC-LLaMA by chaoyi-wu

Medical LLM for instruction-following in the medical domain

Created 2 years ago

Updated 1 year ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect) and

Wing Lian

Wing Lian(Founder of Axolotl AI).

medAlpaca by kbressem

LLM finetuned for medical question answering

Created 2 years ago

Updated 2 years ago

KG_RAG by BaranziniLab

KG-RAG empowers LLMs using knowledge graphs for knowledge-intensive tasks

Created 2 years ago

Updated 1 year ago

Med-ChatGLM by SCIR-HI

ChatGLM fine-tune for Chinese medical QA

Created 2 years ago

Updated 2 years ago

LLMs-from-scratch-CN by MLNLP-World

Chinese translation of LLM tutorial from scratch

Created 11 months ago

Updated 2 months ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI),

Travis Fischer

Travis Fischer(Founder of Agentic), and

5 more.

LLMSurvey by RUCAIBox

Survey paper for large language models

Created 2 years ago

Updated 10 months ago

Feedback? Help us improve.