PromptCBLUE  by michael-wzhu

Chinese instruction-tuning dataset for multi-task, few-shot medical NLP

created 2 years ago
374 stars

Top 76.9% on sourcepulse

GitHubView on GitHub
Project Summary

PromptCBLUE is a benchmark dataset and evaluation framework for large language models (LLMs) in the Chinese medical domain. It transforms 16 existing CBLUE tasks into prompt-based generation tasks, aiming to standardize LLM evaluation in medical NLP. The project targets researchers and developers working with LLMs in healthcare, providing a unified platform for assessing model performance on diverse medical NLP challenges.

How It Works

PromptCBLUE reformulates 16 medical NLP tasks from the CBLUE benchmark into a prompt-based generation format. Each task is converted into an input, target, type, and answer_choices structure, suitable for LLM processing. This approach allows for a unified evaluation of LLMs across various medical NLP tasks, leveraging the prompt-engineering paradigm.

Quick Start & Requirements

  • Dataset Download: Access full datasets via the PromptCBLUE evaluation websites (General Track or Open Source Track). Toy examples are available in datasets/toy_examples.
  • Submission: Submit a test_predictions.json file and a post_generate_process.py script (Python standard library only) for evaluation.
  • Resources: The project provides baseline code using ChatGLM-6B with p-tuning and LoRA. It also offers pre-trained models like ChatMed-Consult and ChatMed-TCM, fine-tuned on LLaMA.

Highlighted Details

  • Offers two tracks: a General Track for any LLM and an Open Source Track requiring open-source base models and datasets.
  • Includes supplementary resources: ChatMed_Consult_Dataset (500k+ online consultations with ChatGPT replies) and ChatMed_TCM_Dataset (26k+ TCM instructions).
  • Provides baseline implementations using ChatGLM-6B with p-tuning and LoRA, showing competitive performance.
  • Supports evaluation of ChatGPT via in-context learning (ICL) as a reference.

Maintenance & Community

  • Organized by researchers from East China Normal University, Alibaba, Huashan Hospital, Northeastern University, and others.
  • Evaluation is hosted on the Tianchi platform.
  • Community discussion channels include DingTalk and WeChat groups.

Licensing & Compatibility

  • Resources are for academic research use only; commercial use is strictly prohibited.
  • The project is based on the CBLUE benchmark.

Limitations & Caveats

  • The project explicitly forbids using public LLM APIs (GPT-4, ChatGPT, etc.) for test set predictions, except for the model's developers.
  • Participants must disclose their training methods and data sources.
  • The "Open Source Track" requires adherence to specific data and model licensing for training.
Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
19 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.